Cybernetic Intelligence 9781845447465, 9781845447458

We are most grateful to our Guest Editor Dr Oussalah and his colleagues of the IEEESMC UK and Ireland Chapter for accept

230 84 5MB

English Pages 372 Year 2005

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Cybernetic Intelligence
 9781845447465, 9781845447458

Citation preview

kyb cover (i).qxd

20/10/2005

12:50

Page 1

ISBN 1-84544-745-X

ISSN 0368-492X

Volume 34 Number 9/10 2005

Kybernetes The international journal of systems & cybernetics Cybernetic intelligence Guest Editor: Mourad Oussalah

Selected as the official journal of the World Organisation of Systems and Cybernetics

www.emeraldinsight.com

Kybernetes The International Journal of Systems & Cybernetics

ISSN 0368-492X Volume 34 Number 9/10 2005

Cybernetic intelligence Guest Editor Mourad Oussalah

Access this journal online _________________________ 1304 Editorial advisory board __________________________ 1305 Preface __________________________________________ 1306 Editorial _________________________________________ 1307 Second-order cybernetics and enactive perception J.M. Bishop and J.S. Nasuto ______________________________________ 1309

Semantic category theory cognition and conceptual reference John G. St Quinton _____________________________________________ 1321

Bipolar logic and probabilistic interpretation Mourad Oussalah ______________________________________________ 1349

Cooperative clans Nathan Griffiths _______________________________________________ 1384

Future reasoning machines: mind and body Brian R. Duffy, Gregory M.P. O’Hare, John F. Bradley, Alan N. Martin and Bianca Schoen________________________________ 1404

Access this journal electronically The current and past volumes of this journal are available at:

www.emeraldinsight.com/0368-492X.htm You can also search more than 100 additional Emerald journals in Emerald Fulltext (www.emeraldinsight.com/ft) and Emerald Management Xtra (www.emeraldinsight.com/emx) See page following contents for full details of what your access includes.

CONTENTS

CONTENTS continued

Machine vision methods for autonomous micro-robotic systems B.P. Amavasai, F. Caparrelli, A. Selvan, M. Boissenin, J.R. Travis and S. Meikle ________________________________________ 1421

Optimisation enhancement using self-organising fuzzy control Ann Tighe, Finlay S. Smith and Gerard Lyons ______________________ 1440

Intelligent agents and distributed models for cooperative task support R. Patel, R.J. Mitchell and K. Warwick _____________________________ 1456

Landscape classification and problem specific reasoning for genetic algorithms F. Mac Giolla Bhrı´de, T.M. McGinnity and L.J. McDaid _______________ 1469

REGULAR JOURNAL SECTIONS Business cybernetics: a provocative suggestion Vojko Potocan, Matjaz Mulej and Stefan Kajzer _____________________ 1496

Dynamic portfolio management under competing representations ¨ stermark ________________________________________________ 1517 Ralf O

Enterprise resource planning competence centres: a case study Annika Granebring and Pe´ter Re´vay _______________________________ 1551

Time and systems Robert Valle´e __________________________________________________ 1563

Systemic philosophy and the philosophy of social science. Part II: the systemic position Jon-Arild Johannessen and Johan Olaisen ___________________________ 1570

Image labelling in real conditions Juan Manuel Garcı´a Chamizo, Andre´s Fuster Guillo´ and Jorge Azorı´n Lo´pez_________________________________________ 1587

Aspects of a theory of systemic construction Nicolae Bulz___________________________________________________ 1598

To performance evaluation of distributed parallel algorithms Juraj Hanuliak and Ivan Hanuliak_________________________________ 1633

Contemporary systems and cybernetics New initiatives in the development of neuron chips and in biomimetics Brian H. Rudall________________________________________________ 1651

CONTENTS continued

Internet commentary Cybernetics and systems on the web: hoax paper, nanotechnology A.M. Andrew _________________________________________________ 1656

Book reviews_____________________________________ 1659 Book reports _____________________________________ 1662 News, conferences and technical reports ___________ 1665 Announcements __________________________________ 1668 Special announcements ___________________________ 1669

www.emeraldinsight.com/k.htm As a subscriber to this journal, you can benefit from instant, electronic access to this title via Emerald Fulltext and Emerald Management Xtra. Your access includes a variety of features that increase the value of your journal subscription.

Additional complimentary services available

How to access this journal electronically

E-mail alert services These services allow you to be kept up to date with the latest additions to the journal via e-mail, as soon as new material enters the database. Further information about the services available can be found at www.emeraldinsight.com/alerts

To benefit from electronic access to this journal you first need to register via the internet. Registration is simple and full instructions are available online at www.emeraldinsight.com/admin Once registration is completed, your institution will have instant access to all articles through the journal’s Table of Contents page at www.emeraldinsight.com/0368-492X.htm More information about the journal is also available at www.emeraldinsight.com/ k.htm Our liberal institution-wide licence allows everyone within your institution to access your journal electronically, making your subscription more cost effective. Our web site has been designed to provide you with a comprehensive, simple system that needs only minimum administration. Access is available via IP authentication or username and password.

Key features of Emerald electronic journals Automatic permission to make up to 25 copies of individual articles This facility can be used for training purposes, course notes, seminars etc. This only applies to articles of which Emerald owns copyright. For further details visit www.emeraldinsight.com/ copyright Online publishing and archiving As well as current volumes of the journal, you can also gain access to past volumes on the internet via Emerald Fulltext and Emerald Management Xtra. You can browse or search these databases for relevant articles. Key readings This feature provides abstracts of related articles chosen by the journal editor, selected to provide readers with current awareness of interesting articles from other publications in the field. Non-article content Material in our journals such as product information, industry trends, company news, conferences, etc. is available online and can be accessed by users. Reference linking Direct links from the journal article references to abstracts of the most influential articles cited. Where possible, this link is to the full text of the article. E-mail an article Allows users to e-mail links to relevant and interesting articles to another computer for later use, reference or printing purposes. Emerald structured abstracts New for 2005, Emerald structured abstracts provide consistent, clear and informative summaries of the content of the articles, allowing faster evaluation of papers.

Your access includes a variety of features that add to the functionality and value of your journal subscription:

Research register A web-based research forum that provides insider information on research activity world-wide located at www.emeraldinsight.com/researchregister You can also register your research activity here. User services Comprehensive librarian and user toolkits have been created to help you get the most from your journal subscription. For further information about what is available visit www.emeraldinsight.com/usagetoolkit

Choice of access Electronic access to this journal is available via a number of channels. Our web site www.emeraldinsight.com is the recommended means of electronic access, as it provides fully searchable and value added access to the complete content of the journal. However, you can also access and search the article content of this journal through the following journal delivery services: EBSCOHost Electronic Journals Service ejournals.ebsco.com Informatics J-Gate www.j-gate.informindia.co.in Ingenta www.ingenta.com Minerva Electronic Online Services www.minerva.at OCLC FirstSearch www.oclc.org/firstsearch SilverLinker www.ovid.com SwetsWise www.swetswise.com

Emerald Customer Support For customer support and technical help contact: E-mail [email protected] Web www.emeraldinsight.com/customercharter Tel +44 (0) 1274 785278 Fax +44 (0) 1274 785204

EDITORIAL ADVISORY BOARD A. Bensoussan President of INRIA, France V. Chavchanidze Institute of Cybernetics, Tbilisi University, Georgia A.B. Engel IMECC-Unicamp, Universidad Estadual de Campinas, Brazil R.L. Flood Hull University, UK F. Geyer The Netherlands Universities Institute for Co-ordination of Research in Social Sciences, Amsterdam, The Netherlands A. Ghosal Honorary Fellow, World Organisation of Systems and Cybernetics, New Delhi, India R. Glanville CybernEthics Research, UK R.W. Grubbstro¨m Linko¨ping University, Sweden Chen Hanfu Institute of Systems Science, Academia Sinica, People’s Republic of China G.J. Klir State University of New York, USA Yi Lin International Institute for General Systems Studies Inc., USA

K.E. McKee IIT Research Institute, Chicago, IL, USA M. Ma˘nescu Academician Professor, Bucharest, Romania M. Mansour Swiss Federal Institute of Technology, Switzerland K.S. Narendra Yale University, New Haven, CT, USA C.V. Negoita City University of New York, USA W. Pearlman Technion Haifa, Israel A. Raouf Pro-Rector, Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences & Technology, Topi, Pakistan Y. Sawaragi Kyoto University, Japan B. Scott Cranfield University, Royal Military College of Science, Swindon, UK D.J. Stewart Human Factors Research, UK I.A. Ushakov Moscow, Russia J. van der Zouwen Free University, Amsterdam, The Netherlands

Editorial advisory board

1305

Kybernetes Vol. 34 No. 9/10, 2005 p. 1305 # Emerald Group Publishing Limited 0368-492X

K 34,9/10

1306

Preface We are most grateful to our Guest Editor Dr Oussalah and his colleagues of the IEEE SMC UK and Ireland Chapter for accepting the Editorial Advisory Board of this journal’s invitation to compile this special double issue. It is based on the successful workshop held by the Chapter, and the papers presented have been both extended and revised to meet the exacting standards of this journal. The choice of title “Cybernetic Intelligence” not only reflects the theme chosen but also the pioneering work of cyberneticians and systemists in this area. Artificial intelligence (AI) is often assumed by many scientists to be an entirely new study when in fact the history of cybernetics as we know it, has surely demonstrated the early interests and developments in the field. This compilation is focussed on “intelligence” and covers a wide range of topics that illustrate a cybernetic approach to what is a fascinating study, and the authors are to be congratulated on their much appreciated endeavours. This double issue, as is our practice, also contains our regular journal sections and contributed papers. Readers are also reminded that an index to all the contributions published in the ten issues of Volume 34, 2005 is available to all subscribers online. Brian H. Rudall Editor-in-Chief

Kybernetes Vol. 34 No. 9/10, 2005 p. 1306 q Emerald Group Publishing Limited 0368-492X

Editorial This special issue of Kybernetes journal presents extended and revised versions of some papers presented in the workshop of IEEE Systems, Man & Cybernetics UK and Ireland Chapter, which held in Reading in September 2003. For information about IEEE, SMC, UK&RI Chapter and its activities please visit our web sites: www.ieee.org and www.ieee.org.uk/smc.html. The workshop focused on cybernetics intelligence, which includes, among others: . computational intelligence and its paradigms; . self-organizing systems; . adaptive systems; . learning mechanisms in control systems; . intelligent decision-making systems; . second-order cybernetics; . hybrid intelligent systems; and . current challenges and future paradigms. The aspect of the second-order cybernetics that interacts with enactive perception and dynamic system theory has been investigated by Bishop and Nasuto. St Quinton has focused on linguistic machine intelligence and proposed four semantic categories that allow us to overcome ambiguity and confusion that arise in natural spoken language. From the modelling perspective, Oussalah has investigated the notion of bipolar reasoning and bipolar logic from probabilistic perspective and new results that ensure coherence between cognitive-map based interpretation and bipolar logic have been pointed out. The concept of clan and cooperative clan have been put forward by Griffiths in order to force agents to have some cooperative behaviour in a multiagent structure. Two papers have been devoted to the field of intelligent robotics. Duffy and his colleagues explore the idea of one-mind-many-bodies metaphor as in the agent Chameleon work. While Amavasai and his colleagues describe the new findings regarding EU project on micro-robotics. In the applications field, Tighe and her colleagues described a self-organizing fuzzy control applied to vehicle routine problems. The performances of the optimization algorithm have been pointed out. Patel, Mitchell and Warwick described a design and implementation of a multi-agent based network for the support of collaborative switching tasks. Finally, Mac Giolla Bhrı´de, McGinnity and McDaid have investigated a genetic algorithm based approach, which incorporates a rule-based reasoning system that acts as a supervisory module to genetic algorithm. We take this opportunity to acknowledge the efforts of all colleagues and sponsors who contributed to the conference, and members of the organising

Editorial

1307

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1307-1308 q Emerald Group Publishing Limited 0368-492X

K 34,9/10

1308

committee. We would also like to acknowledge the efforts of the reviewers who contributed to the conference. A special thanks to Professor B.H. Rudall – the Editor-in-Chief of Kybernetes journal – for accepting and encouraging this special issue. Finally, we would like to thank all the authors who contributed to this special issue and the reviewers for their thoughtful and insightful comments. M. Oussalah On behalf of IEEE SMC UK & Ireland Chapter Executives, University of Birmingham, UK

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Second-order cybernetics and enactive perception

Second-order cybernetics

J.M. Bishop Goldsmiths College, University of London, London, UK, and

1309

J.S. Nasuto Department of Cybernetics, University of Reading, Whiteknights, Reading, UK Abstract Purpose – To present an account of cognition integrating second-order cybernetics (SOC) together with enactive perception and dynamic systems theory. Design/methodology/approach – The paper presents a brief critique of classical models of cognition then outlines how integration of SOC, enactive perception and dynamic systems theory can overcome some weaknesses of the classical paradigm. Findings – Presents the critique of evolutionary robotics showing how the issues of teleology and autonomy are left unresolved by this paradigm although their solution fits within the proposed framework. Research limitations/implications – The paper highlights the importance of genuine autonomy in the development of artificial cognitive systems. It sets out a framework within which the robotic research of cognitive systems could succeed. Practical implications – There are no immediate practical implications but see research implications. Originality/value – It joins the discussion on the fundamental nature of cognitive systems and emphasise the importance of autonomy and embodiment. Keywords Cybernetics, Cognition Paper type Conceptual paper

1. Introduction It should be noted that from now on “the system” means not the nervous system but the whole complex of the organism and the environment. Thus, if it should be shown that “the system” has some property, it must not be assumed that this property is attributed to the nervous system: it belongs to the whole; and detailed examination may be necessary to ascertain the contributions of the separate parts (W. Ross Ashby, 1952).

An oft repeated aphorism is that the world is in a perpetual state of flux and hence that our universe is constantly changing. Thus, in order to behave intelligently within the natural environment any Cybernetic system, be it man, machine, or animal, faces the problem of perceiving invariant aspects of a world in which no two situations are ever exactly the same. Cartesian theories of perception can be broken down into what Chalmers (1996), calls the “easy” problem of perception; the classification and identification of sense stimuli and a corresponding “hard” problem, which is the realization of the associated phenomenal state. The difference between the “easy” and the “hard” problems and an apparent lack of the link between theories of the former and an account of the latter has been termed the “explanatory gap”. Many current theories of natural visual processes are grounded upon the idea that when we perceive, sense data are processed by the brain to form an internal

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1309-1320 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614696

K 34,9/10

1310

representation of the world. The act of perception thus involves the activation of an appropriate representation. Thus, the easy problem reduces to forming a correct internal representation of the world and the hard problem reduces to answering how the activation of a representation gives rise to a sensory experience. In machine perception progress in solving even the “easy” problem has so far been slow; typical bottom-up (or data driven) methodologies, involve the processing of raw sense data to extract a set of features; the binding of these features into groups then classifying each group by reference to a putative set of models. Conversely, in typical top down methods, a set of hypotheses of likely perceptions are generated, which are then compared to a set of features in a search for evidence to support each hypothesis. Historically, cybernetic approaches have favoured the former and computer science the latter; however, amalgams of the two have also been explored. To date the success of all approaches has at best been patchy and limited to a very small subset of the human perceptual gamut. 2. First-order cybernetics First-order cybernetics (FOC) characterises agent-environment systems in terms of feedback loops whose operation can be interpreted by an observer (or engineer) in terms of teleological behaviour (i.e. moving towards a goal). Alternatively, an engineer may manipulate a system and include in it feedback loops in order to achieve behaviour consistent with proscribed purpose. An early example of such a behaviour is found in the work of W. Grey Walter. He demonstrated that apparent teleological behaviours such as following a light source (without approaching too closely) can be instantiated in a very simple FOC device (Plate 1). Observing their behaviour Walter remarked that, “despite being crude (the tortoises) conveyed the impression of having goals, independence and spontaneity”. Yet, in our opinion, such a teleological interpretation of tortoise behaviour is unwarranted as this behaviour was explicit in the design. Other examples of FOC systems include Gaia, Lovelock’s cybernetic view of planetary feedback processes that maintain stable conditions suitable for life; or paradigmatic control system such as the Watt’s governor, used to control the speed of a steam engine under varying loads. Cyberneticians, such as Wiener and McCulloch

Plate 1. Walter’s tortoise (copyright Burden Neurological Institute, Bristol, UK)

investigated the operation of the nervous system from this perspective, which led to the development of bottom up, (connectionist), models of cognitive processes. 3. Second-order cybernetics FOC is concerned with the observation and control of systems, (agent and environment) positing a distinguished role for an observer, as an entity decoupled and independent of the system. In contrast, second-order cybernetics (SOC) recognises the inseparability of the observer and the system. There is no observer outside the system; the agent is the cognitive observer of its environment, exposing a distinction between the FOC of observed systems and the SOC of observing systems (Von Foerster, 1974). This change of view explicitly focuses SOC onto the explanation of cognitive processes as determined by the agent-environment coupling dynamics. In SOC the observer has no knowledge of how the world “really is” – there is no homunculus observing an internal model of the external world. Instead, SOC highlights the fundamental distinction between the physical world and our perception of it; inner “models” are not representations of an outer reality, but subjective dynamic constructions that, by complex feedback paths within the observer and environment, move the system towards its emergent goals. In fact, SOC recognises a multitude of potential alternative reality constructs – our everyday life unfolds but one of them. This implies a move away from a concept of a common objective reality replacing it with individual observer relative constructs. Many illustrations of this come from the study of brain damage; one such example being motion blindness. A patient with this condition is unable to cross a street because the motion of cars is invisible to her: a car is up the street and then upon her, without ever seeming to occupy the intervening space. Analogous to someone watching a movie with a low frame rate, this patient perception of the world is of a series of still images. The fact that we have evolved able to perceive the world in motion and at rest is a result of one possible path of evolution; alternative paths are feasible, for example, some simple vertebrates (e.g. frogs) are only able to perceive moving prey – they will ignore a nearby stationary fly even when hungry. However, in practice the danger of complete relativism, where any perception of the world is as good as any other, is avoided by “coherence” and “invariance”: coherence being a social process whereby phenomena become real by consensus; invariance being a fundamental property of the world entailed by physical laws whereby entities tend to maintain their properties over time. At the heart of FOC there is an asymmetry in the closed loop feedback (circular causality), which posits the observer outside of the loop, which SOC deconstructs by including a human (observer) in the loop. This results in the different concepts of goal-directed system behaviour (teleology) championed by both theories. In contrast to FOC, in which the system’s teleology is a result of manipulation or intepretation of the agent’s behaviour by an external observer decoupled from that system, in SOC teleological properties are observer/agent relative emergent properties and not externally defined objective properties of the system. To summarise, the deconstruction of the asymmetry inherent in FOC has at least three significant implications.

Second-order cybernetics

1311

K 34,9/10

1312

(1) The observer looses its distinguished position and can be treated as just another part of the whole system. (2) SOC is inherently and explicitly concerned with explaining the observer’s cognitive processes (including goal orientedness). (3) SOC entails a constructivist epistemology (theory of knowledge) which starts from the assumption that, “the thinking subject has no alternative but to construct what he or she knows on the basis of his or her own experience” (Von Glasersfeld, 1995). In the remainder of the paper, we will explore links between SOC, the enactive theory of perception (ETP) and dynamic systems (DS). 4. Dynamic systems theory of cognition (DSC) The DSC (Port and Van Gelder, 1981; Van Gelder, 1997), offers another alternative framework to conventional computational theories of mind in which . . . . . . cognitive systems are computers (digital, rule-governed, interpretable systems), with a modular internal structure; they interact with their environments in a cyclic process that begins with input transducers producing symbolic representations in response to the environment, continues with sequential internal computations over symbolic structures, and ends with output transducers affecting the environment in response to the symbolic specifications; the whole process can be considered independently of the body and the environment except insofar as they deliver occasional inputs and receive outputs (Van Gelder, 1997).

Thus, in computational systems cognition is equivalent to transformation of symbolic states representing knowledge or particular cognitive states. The transitions between states are instantaneous hence time does not play a role in their evolution; only the relative order of states does. The symbolic nature of the state space implies that the magnitude of the changes or the time it takes to achieve them are undefined notions in computational theories of cognition. Moreover, the rules of evolution act locally, on a particular subset of representations and hence in this framework it is possible to consider cognitive acts in isolation from each other, the external environment and even the body. In contrast, the dynamical approach treats cognitive systems as inherently dynamic which implies a profound change of perspective on their operation. In the dynamical systems approach to cognition, the states are defined in terms of some numerical attributes and rules of state evolution are defined over those attributes and not over the knowledge representations. The latter can still be instantiated in, for example, system attractors, system trajectories, etc. However, the potential relations between such representations are not explicitly encoded in the system dynamics. The states are instantiated in the continuous state space and their changes take place in time, hence the latter can assume arbitrarily small values given correspondingly small intervals of observation. It is the rate of state change that is paramount to the dynamical system description. In contrast to the computational systems, it is much more natural within the dynamical framework to consider the changes of the total state of the system composed of mutually interacting or coupled parts with the ongoing modulation of states changes. This ultimately implies coupling of the cognitive system with the environment and the body.

However, although for Dynamicists the cognitive system is fundamentally embodied (being intimately coupled with its environment), supporters of DSC “conceptualise mental phenomena as state space evolution in certain forms of dynamical system” (Wheeler, 2002), hence firmly positing mental phenomena within the agent.

Second-order cybernetics

Viewed under a post Cartesian perspective (rejecting the view of perception as the activation of appropriate a-temporal representations), the advantages of the dynamical account of cognition, which emphasises ongoing, real-time interaction of situated agents with a changing world, becomes clear (Van Gelder, 1997).

1313

In viewing cognition as a continuous dynamic process, Dynamicists explicitly reject the notion of cognition as the computational manipulation of representations. The DSC outlines how to intelligently interact with the world, without the necessity of its explicit representation. 5. Enactive theory of perception The enactive theory of perception (ETP) suggests an alternative paradigm for perception. Instead of considering that the operation of the nervous system leads to the creation of appropriate internal representations of the world, which somehow jump the “explanatory gap” to realise the relevant phenomenal states of experience, it considers the world as its own “representation” and perception as an embodied exploratory (enactive) process of the world mediated by sensorimotor contingencies. The theory championed by Varela and Thompson (Varela et al., 1991) proposes three fundamental components which together give a full account of cognition. These include the low-level biological/neural processes (subject of third person accounts), the high-level phenomenological data (first person accounts) and formal dynamical theory as a bridge criss-crossing the explanatory gap between the two seemingly irreconcilable domains. ETP attributes the unity of apperception to large-scale neural synchrony. In its view the phenomenological states are emergent properties of the non-linear interactions of the body and the nervous system (upwards causation). It explains phenomenological casualty (downwards causation) by referring to the modulation of neuronal processes by global order parameters (phenomenological states). Although reference to the non-linear dynamics seems to bring ETP close to the DSC, the fundamental difference is the emphasis both theories place on the role of dynamics and embodiment. ETP, in contrast to DSC, stresses the importance of embodiment, constituting the physical substrate in which the cognitive processes evolve – “conscious experience occurs only at the level of the whole embodied and situated agent” (Varela et al., 1991). Varela and Thompson characterise three dimensions of embodiment which describe the relation between the embodied neural dynamics and phenomenology. These dimensions are intersubjective interactions in social behaviour; organismic regulation related to the operation of the autonomic nervous system, linking the fundamental physiological processes of the body to primal consciousness, or sentience – feeling of self; and finally, the sensorimotor coupling between an agent and the environment. A particular subset of ETP, which has recently attracted much attention, focuses predominantly on such sensorimotor coupling and purports to dissolve the explanatory gap and solve the problem of qualia by redefining this notion, “experience is not a thing that happens to people, but a thing that people do” (O’Regan, 2004).

K 34,9/10

1314

5.1 Sensorimotor account of visual cognition Sensorimotor account of visual cognition (SMC) is an idea rooted in Ryle’s (1949) description of a thimble defined by the different perspective views it imparts as it is moved around in the visual field and which recently has been successfully developed and extended by O’Regan and Noe (2001)[1]. In SMC first person experiences are not states they are simply activities; hence to speak of phenomenal states of the brain is an example of what Ryle’s (1949), termed a “category mistake”, as there are no such states; qualia are illusions – there exist only the different acts of experience. The first person feeling of perceiving, say, the ineffable pink of a rose, arises purely from the specific sensorimotor contingencies of interacting with a pink rose; as opposed to say, interacting with a green apple. Experience is something we do and its qualitative features are simply aspects of our interactions with the world. Damasio (1996) proposed a somewhat analogous interaction between cognitive processes related to decision making and physiological states of the entire body. As SMC is a general framework for vision, evidence for it is not direct and does not test the theory in the conventional sense; rather SMC accounts for several puzzling observations that are difficult to reconcile with conventional theories of vision. Hence, as evidence for their “Sensorimotor Account” O’Reagan and Noe (2001) discuss several problems which appear to fade under the generic spotlight of SMC, which are discussed in the following sub-sections. 5.1.1 The stability of visual perception independent of eye saccades. For over a century, when viewed within the standard framework of model based vision, where the job of the visual system is to transform the image of the world that is projected onto the retina into an equivalent internal 3D representation of the scene, it has been difficult to understand why perturbations of the image projected onto the retina caused by eye saccades, do not cause similar perturbations in phenomenal perception. The mechanism that has historically been suggested to correct for such disturbances is the existence of correcting “extraretinal” signal. However, experimental results from Martin (1986) suggest that the candidate signals are both too inaccurate and too sluggish to correct for such perturbations. Conversely, viewed in the context of SMC, what remains invariant as the world is perceived is just the knowledge of how the sensed image (i.e. the pattern projected onto the retina) will transform as the eye saccades across the perceived scene. 5.1.2 The non-perception of the “blind spot” and the perception of smooth visual continuity despite the non homegeneity of spatial and colour sensors in the eye. A second puzzle for classical accounts of visual perception is why we do not explicitly perceive the blind spot (i.e. the location on the retina where the optic nerve emerges and where there are no photoreceptors), or detect that the acuity of the eye, (and the distribution of photoreceptors across the retina), falls off steadily from the central foveal area to the edge. Classical theories postulate some kind of compensation or “filling in” mechanism to account for this[2]. Conversely, in SMC perceptual experience of the world is exercised only by the sensorimotor contingencies defining how we expect the sensed data from a scene to change as the eyes saccades across it; in this context the fact that photoreceptors are not uniformly distributed across the retina does not pose a particular problem.

5.1.3 Change blindness. Change blindness experiments in particular highlight specific problems with classical theories of vision, which are not present in Enactive theories. In change blindness a subject alternatively observes a specific scene A and a modified version of it, scene B, significantly changed from the original. The succession of images is not instantaneous, between presentation of scene A and scene B there is a short period of blank screen, giving the appearance to the subject of a blink, masking the detection of low level, transient change. It has been experimentally observed that large-scale changes can be made to a visual scene and yet not be perceived by the subject. In classical “bottom up” theories of vision, the large scale differences in the two alternating images shown in Plates 2 and 3, would cause very different patterns of activation in low level visual processes. Conversely, in “top down” theories the hypothesis that the images are the same is trivially disproved by the large scale disparity between the two. However, the subjects report of “not noticing the change” is entirely consistent with SMC, where the world serves as its own “outside memory” and the subject perceives only what he/she is enactively attending too.

Second-order cybernetics

1315

Plate 2. Change blindness; scene A O’Regan (2001)

Plate 3. Change blindness; scene B O’Regan (2001)

K 34,9/10

Like SOC, SMC is inherently constructivist as knowledge of the world is actively constructed by the perceiving agent through its interaction with the environment. As Scott (1996) observes, “the ‘objects’ that we experience are ‘tokens’ for the behaviours that give rise to those experiences”.

1316

6. An alternative model of cognition; the unification of DSC, ETP and SOC At the heart of computational theory of cognition there is the notion of computation as calculation arising from Turing’s work at Blechley Park in WWII. During this period Turing was concerned with abstracting the essential processes carried out by [human] “computers” in the course of code breaking. This led to the formalisation of computation via the universal Turing machine. Subsequently, this notion led to the widely held view of the so-called Church/Turing thesis, asserting the equivalence of all “effective procedures” to perform tasks [models of computation] with Turing machines. Hence, the view that computations are central to cognition was born. In fact, many scientists, particularly in cognitive science, take this metaphor literally, claiming that cognition is computation, meaning that cognitive processes are serial computational operations on appropriate knowledge representations. In fact, any claim on the serial computational nature of cognitive processes is valid only insofar as it is a convenient mental shortcut meaning that their operation can be described in such terms. The equivalence of cognition with serial computation can only be the case insofar as the external third person interpretation of the cognitive processes goes; a view similar to the observer/(cognitive) system asymmetry of FOC. Such asymmetry imposes meaning on the observed behaviour of the (cognitive) system; a meaning that may not be unique (as different observers may attribute different meanings to the system’s behaviour) and hence not necessarily a truly intrinsic property of the (cognitive) system’s operation. Viewed in this light, the explanation of the (purported cognitive) system’s operation remains ultimately formal. Hence, the theory lacks the explanatory power required to differentiate between cognitive and non-cognitive systems; it equally ascribes teleological behaviour to both. Similarly it cannot account for a cognitive system’s phenomenology simply because any phenomenology ascribed to cognition is in reality a reflection of the external phenomenology of the observer (the “external observer fallacy”); a view consistent with the Searlian idea of the observer relativity of computational processes (Searle, 1990). This observation is valid for both FOC and computational approaches. In addition to classical model of cognition’s inherent inability to explain teleological behaviour, serial computation has recently lost its unique position as the only mode of computation. Recently, alternative views of computation have emerged encompassing different modes of concurrency and mutual interaction between the sub-systems, system and real world. Indeed, various authors have proposed a paradigm shift to such “interactive models of computing” (Wegner, 1997; Stein, 1999), which enable the description of systems whose operation falls outside the Turing machine framework. Thus, this begs the question why one would equate cognition with Turing equivalent computation. A successful theory of cognition must account for fundamental properties of cognitive systems. At the very least these would seem to involve many coupled components simultaneously affecting each other, embedded in the world and interacting with the environment in real time giving rise to teleological behaviour.

From the earlier descriptions of SOC, ETP and DSC there emerges a unified view of the three theories which could account for the above characteristics. In SOC, the observer and the environment are considered as one interacting system, encompassing genuine cognitive processes and in particular leading to teleological behaviour. However, SOC is a meta-theory that forces us to explicitly account for cognitive processes without specifying the mechanisms that give rise to them. ETP explicitly defines cognitive processes as arising from the coupling of the observer and the environment fits perfectly within the SOC framework. Hence, we can consider it as a particular instantiation of SOC emphasizing the role of embodiment as the specific neuro-psychological mechanism essential for cognition. Further, the natural way to formally describe the embodied interactions between cognitive processes and the environment is in terms of DS, in which the observer/system evolution is modelled by a series of coupled differential equations. The subsets of differential equations correspond to the subsystems being modelled; the couplings between the corresponding equations reflect the couplings between the subsystems. However, in contrast to DSC, which claims cognitive processes are the instantiations of specific state space trajectories within the agent, in the unified view cognition emerges from agent – environment interactions, and hence is not solely situated within the agent. Hence by combining SOC, ETP and DSC there is the possibility of closing the phenomenological gap as such a combination does not lead to the external observer fallacy described earlier, yet accounts for the fundamental attributes of cognitive processes. 7. Evolutionary robotics Dynamical approaches have also found favour in evolutionary robotics, particularly in the work of Harvey (1996, 2002). After Varela et al. (1991), Harvey considers an agent (human, animal, robot) as a perturbed complex dynamic system tightly coupled to its environment. However, designing controllers with these properties is a very difficult task. Unlike conventional computational robotics, such approaches are not amenable to traditional “divide and conquer” methodologies as, “the design of any one small component depends on an understanding of how it interacts in real time with the other components, such interaction possibly being mediated by the environment” (Harvey, 2002). Hence, Harvey (1996) uses evolutionary algorithms in order to achieve the required dynamic behaviours of the robots. In this work a genetic encoding is set up such that an artificial genotype, typically a string of 0s and 1s, specifies a control system for a robot. This is visualised and implemented as a dynamical system acting in real time; different genotypes will specify different control systems. A genotype may additionally specify characteristics of the robot “body” and sensorimotor coupling with its environment. When we have settled on some particular encoding scheme, and we have some means of evaluating robots at the required task, we can apply artificial evolution to a population of genotypes over successive generations. Typically the initial population consists of a number of randomly generated genotypes, corresponding to randomly designed control systems. These are instantiated in a real robot one at a time, and the robot behaviour that results when placed in a test environment is observed and evaluated. [. . .]

Second-order cybernetics

1317

K 34,9/10

1318

The cycle of instantiation, evaluation, selection and reproduction then continues repeatedly, each time from a new population which should have improved over the average performance of its ancestors.

Harvey’s approach appears to be very much in line with the postulates of the SOC and ETP. This is because he explicitly embeds controllers in real robots and evolves them in response to real environmental pressures. This emphasises the embedding and coupling of the robot and the environment; the hallmarks of the integrated theory outlined above. Harvey also asserts that his approach, which explicitly does not equate cognition with computation, renders arguments against machine intelligence based on Go¨del’s incompleteness theorems (Lucas, 1961; Penrose, 2002), “irrelevant”. Hence, he claims that in principle his methodology can lead to the evolution of genuinely cognitive robots. In order to address this position we deconstruct Harvey’s position into a “strong” and a “weak” version. The strong position maintains that this methodology in principle can evolve any cognitive/conscious behaviour in a robot. The weaker claim is simply that the methodology can evolve at least some genuine cognitive/conscious behaviours. We address the strong claim by reference to Penrose/Go¨delian arguments against machine understanding. As Penrose has illustrated some aspects of cognition, (e.g. the aperiodic tiling decision problem), involve non-computational processes. However, Harvey has acknowledged (private conversation) that for convenience he often employs “cheats” by using computers plus essential clocking as the underlying DS. In this case all that is being achieved is the evolution of a formal/computational description of behaviour which is open to attack by Penrose style arguments, i.e. “. . . the powers of human reason could not be limited to any accepted preassigned system of formalised rules. What Go¨del showed was how to transcend any such system of rules, so long as those rules could themselves be trusted” (Penrose, 1994). Harvey’s response to the above critique might be the weaker claim that he could achieve at least some genuine cognitive states within his robots, just not the full range of the cognitive powers of humans. However, it is apparent that in any artificial evolutionary system the critic that performs the evaluation function necessary to maintain the “selective pressure’ is explicitly defined by the external observer/engineer; hence any teleological behaviour arises as a result of the external observer’s teleology plus the built pre-ordained optimisation characteristics of evolutionary algorithms. Harvey might retort that he did not have to use his “cheat”; he could have used a genuine dynamical system. Nevertheless, our comments apply to any evolutionary algorithm irrespective of the underlying nature (computational emulation of or genuine dynamic) of the artificially evolved systems. All that is achieved by changing the nature of the systems from computational to real dynamic is a move between the computational and the FOC frameworks: as discussed earlier both are subject to the external observer fallacy. Thus, although an interesting approach, which on the surface is generally in line with the integrated theory (SOC/ETP/DS), closer inspection reveals subtle differences that mean it does not fully conform to the unified theory.

8. Conclusions We have argued herein that all computational (and FOC) approaches to cognition share the same fate: the external observer fallacy. We suggested that the best conceptual framework to avoid this fallacy and encompass the fundamental characteristics of cognitive systems is by integrating the ETP and DS in the general framework of SOC. This approach opens the possibility of giving an explanation of cognition that bridges the “explanatory gap” defined by Chalmers (1996). Further, we critically review the evolutionary robotics paradigm, and conclude that, although it is an excellent way to build interesting robots, it is not fully consistent with the proposed integrated theory and hence does not escape the outlined critiques raised against FOC and computationalism. Although we leave open the possibility that evolutionary robotics might 1 day help narrow the “explanatory gap”, at present it would seem that, at best, such an approach is evolving “Computational Zombies”. Notes 1. It is interesting to note that, in the domain of machine perception, a similar approach has also been explored in the field of robotics in the development of active vision systems (Blake and Yuille, 1992; Ballard et al., 1997). 2. Although there is some evidence that there are brain processes that could perform something like “filling in”, this does not mean that the brain actually does use them to “fill in” its putative internal representation of the world. References Ballard, D., Hayhoe, M., Pook, P. and Rao, R. (1997), “Deictic codes for the embodiment of cognition”, Behavioural and Brain Sciences, Vol. 20, pp. 723-67. Blake, A. and Yuille, A. (1992), Active Vision, MIT Press, Cambridge, MA. Chalmers, D. (1996), The Conscious Mind: In Search of a Fundamental Theory, Oxford University Press, Oxford. Damasio, A.R. (1996), “The somatic marker hypothesis and the possible functions of the prefrontal cortex”, Philos. Trans. R. Soc. Lond. B Biol. Sci., Vol. 351 No. 1346, pp. 1413-20. Harvey, I. (1996), “Untimed and misrepresented: connectionism and the computer metaphor”, AISB Quarterly, Vol. 96, pp. 20-7. Harvey, I. (2002), “Evolving robot consciousness: the easy problems and the rest I”, in Fetzer, J.H. (Ed.), Evolving Consciousness: Advances in Consciousness Research Series, John Benjamins, Amsterdam. Lucas, J.R. (1961), “Mind, machines and Go¨del”, Philosophy, Vol. 36, pp. 120-4. Martin, L. (1986), “Visual localization and eye movements”, in Boff, K.R., Kaufman, L. and Thomas, J.P. (Eds), Handbook of Perception and Human Performance, Wiley, New York, NY, Vol. 1, pp. 20.21-45. O’Regan, J.K. (2001), “Experience is not something we feel but something we do: a principled way of explaining sensory phenomenology, with change blindness and other empirical consequences”, talk given at Bressanone, 24 January. O’Reagan, K. (2004), personal communication. O’Reagan, J.K. and Noe, A. (2001), “A sensorimotor account of vision and visual consciousness”, Behavioural and Brain Sciences, Vol. 24 No. 5, pp. 883-917. Penrose, R. (1994), Shadows of the Mind, OUP, Oxford.

Second-order cybernetics

1319

K 34,9/10

1320

Penrose, R. (2002), “Consciousness, computation”, in Preston, J. and Bishop, J.M. (Eds), Views into the Chinese Room; New Essays on Searle and Artificial Intelligence, Clarendon Press, Oxford. Port, R.F. and Van Gelder, T. (1998), Mind as Motion: Explorations in the Dynamics of Cognition, MIT Press, Cambridge, MA. Ryle, G. (1949), The Concept of Mind, Penguin, London. Scott, B. (1996), “Second order cybernetics as cognitive methodology”, Systems Research, Vol. 13 No. 3, pp. 393-406. Searle, J.R. (1990), “Is the brain a digital computer?”, Proceedings of the American Philosophical Association, Vol. 64, pp. 21-37. Stein, L.A. (1999), “Challenging the computational metaphor; implications for how we think”, Cybernetics and Systems, Vol. 30 No. 6, pp. 473-507. Van Gelder, T. (1997), “Dynamics and cognition”, Mind Design II, in Haugeland, J. (Ed) MIT Press, Cambridge, MA. Varela, F., Thompson, E. and Rosch, E. (1991), The Embodied Mind, MIT Press, Cambridge, MA. Von Foerster, H. (1974), “Notes pour un epistemologie des objets Vivants”, in Morin, E. and Piatelli-Palmerini, M. (Eds), L’Unite´ de L’Homme, Editions du Seuil, Paris, pp. 401-17. Von Glasersfeld, E. (1995), Radical Constructivism: A Way of Knowing and Learning, The Falmer Press, London and Washington. Wegner, P. (1997), “Why interaction is more powerful than algorithms”, Communications of the ACM, Vol. 40 No. 5, pp. 80-91. Wheeler, M. (2002), “Change in the rules; computers, dynamical systems and searle”, in Preston, J. and Bishop, J.M. (Eds), Views into the Chinese Room; New Essays on Searle and Artificial Intelligence, Clarendon Press, Oxford.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Semantic category theory cognition and conceptual reference John G. St Quinton

Semantic category theory cognition 1321

Zetetic Systems, London, UK Abstract Purpose – Identifying the fundamental characteristics of meaning and deriving an automated meaning-analysis procedure for machine intelligence. Design/methodology/approach – Semantic category theory (SCT) is an original testable scientific theory, based on readily available data: not assumptions or axioms. SCT can therefore be refuted by irreconcilable data: not opinion. Findings – Human language involves four totally independent semantic categories (SC), each of which has its own distinctive form of “Truth”. Any sentence that assigns the characteristics of one SC to another SC involves what is termed here “Semantic Intertwine”. Semantic intertwine often lies at the core of semantic ambiguity, sophistry and paradox: problems that have plagued human reason since antiquity. Research limitations/implications – SCT is applicable to any endeavour involving human language. Research applications are therefore somewhat extensive. For example, identifying metaphors posing as science, or natural language processing/translation, or solving disparate paradox types, as illustrated by worked examples from: The Liar Group, Sorites Inductive, Russell’s Set Theoretic and Zeno’s Paradoxes. Practical implications – To interact successfully with human language, behaviour, and belief systems, as well as their own environment, intelligent machines will need to resolve the semantic component/intertwines of any sentence. Semantic category analysis (SCA), derived from SCT, and also described here, can be used to analyse any sentence or argument, however complex. Originality/value – Both SCT and SCA are original. Whilst “category error” is an intuitive notion, the observably precise nature, number and modes of interaction of such categories have never previously been presented. With SCT/SCA the rigorous analysis of any argument, whether foisted, valid, or obfuscating, is now possible: by man or machine. Keywords Cybernetics, Psychology, Philosophy, Language Paper type Research paper

1. Introduction “Semantic category theory” (SCT) implies that there are four independent domains of human thought. A written or spoken sentence is a manifestation of thought and may involve any one, or any combination, of the four semantic categories (SC) proposed here. Of particular note is one very specific type of semantic category combination termed “semantic intertwine”. Sentences that involve semantic intertwine are frequently responsible for a variety of linguistic artifacts including: semantic ambiguity, fallacious reasoning and paradox. Also presented here is an analytical technique, derived from SCT, termed “semantic category analysis” (SCA). SCA can be applied to any sentence, however complex, to determine the SC involved and whether semantic intertwine is present.

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1321-1348 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614704

K 34,9/10

1322

Consider for a moment that any one concept may refer either to another concept, or to an observable aspect of the environment. Concept/concept references are to be found for example in pure mathematics, certain schools of Theoretical Physics and metaphysical fantasy stories. Such concept/concept references may denote either: (1) a loosely defined concept: such as an anthropomorphic character in a children’s story; or (2) a rigorously defined concept: such as “the square root of minus 1”. Concept/observable references are to be found for example in applied mathematics, history, experimental physics, ecology and chemistry. Such concept/observable references may denote either: (3) an observable aspect of the immediate environment: such as your current reading matter; or (4) an instance of an observable aspect of the environment: such as yesterday’s weather conditions. The distinction between the four SC proposed here is further reinforced when one considers the very different set of actions required when attempting to assign a truth-value to assertions associated with each of the four independent domains: (1), (2), (3) and (4) above. These four domains of conceptual reference constitute the foundations of SCT. SCT has implications both for the ways in which we process and interpret information and for the ways in which we are vulnerable to being duped by information: especially when semantic intertwine is involved. Semantic intertwine occurs when a sentence assigns a property associated with one semantic category to one of the other three, totally independent, SC. Semantic intertwine constitutes the origins of many form of linguistic artifact including fallacious reasoning. Anything we can think of certainly exists: as a concept. Many concepts do not refer to observable aspects of the environment: fallacious reasoning occurs when we assume they do. Carefully constructed sentences involving semantic intertwine can be very persuasive and lead to the impression that what they refer to necessarily exists in our environment. The notion of “category error” is intuitive and not original. However, hitherto, there has not been an observationally derived set of well-defined SC upon which to base a comprehensive, and testable, theory of meaning. Nor has there been a replicable and uniform procedure for analysing and resolving the diverse range of linguistic artifacts discussed here. In addition, SCT also provides a consistent means to evaluate the nature, development and variety of philosophical, scientific and workaday schools of thought. Neither SCT nor SCA are similar to any previous theory of meaning or linguistic analysis technique. This proposal that cognition involves four specific and independent domains of thought, whose characteristics are evident from observation, is original. SCT is not based on axioms, assumptions, or opinion and is not amenable to refutation by axioms, assumptions, or opinion.

SCT is an empirical scientific theory, based on observation, which can be refuted by conflicting observation. 2. Semantic category theory: overview It is a personal observation that every written or spoken sentence belongs to one, or to a combination, of four distinct domains of discourse. These four domains are termed here “semantic categories” and are represented by the letters “E”, “I”, “A” and “T”. When we express our thoughts we appear to describe Either: Some aspect of our currently observed environment: Semantic category (E). For example: “As I write, there are two coffee cups on the table in front of me” Descriptions of the environment are only SC(E) when they occur “in real time”. Some instance of aspects of our environment: semantic category (I). For example: “Socrates drank hemlock”. SC(I) sentences describe instances of past, predicted or hypothesised environmental situations. Although SC(I) sentences themselves cannot be supported by immediate observation, they exemplify environmental entities and relationships that can be observed: Socrates is a reference to a man, men can drink, one of the things men can drink is hemlock. A rigorously defined concept: Semantic Category (A). For example: The square root of minus one is “j”. A loosely defined concept: Semantic category (T). For example: Klingons hyperdrive wormholes. Or: some combination of these four posited SC: E, I, A, T Observation also suggests that there are two distinct means by which different SC can be combined within a sentence. When individual SC are mixed together within a sentence they are readily discernible. When fused together, properties associated with one semantic category are inappropriately assigned to a different, and entirely independent, semantic category. 2.1 Combined semantic categories: mixed categories The following sentence involves SC(I) reference to aspects of our environment, namely “engineers” and “mathematicians”, together with SC(A) reference to a rigorously defined concept “the square root of minus one”: Mathematicians generally use the letter “i” to refer to the square root of minus one, whilst engineers tend to use the letter “j”. The sentence involves a “mixture” of two SCs, I and A. The meaning of the sentence is quite clear and does not give rise to ambiguity. Neither the engineers nor the mathematicians are themselves being confused with “the square root of minus one”. 2.2 Combined semantic categories: fused categories In some sentences, SCs are not simply mixed together but are “compounded” or “fused” together. A property only associated with one semantic category is made to appear as though it belongs to be a different semantic category. Such confusion of SCs within a sentence is termed here “semantic intertwine”. It is suggested that semantic intertwine lies at the core of many forms of cognitive, and thereby linguistic, confusion. Later, an analytical technique will be presented that

Semantic category theory cognition 1323

K 34,9/10

1324

enables any sentence, however complex, to be expressed in terms of the SC involved and their types of interrelationship; whether mixed, fused or both. Application of this technique can pinpoint the underlying cause, and thereby resolve, semantic ambiguity, fallacious reasoning and paradox. 2.3 Semantic intertwine: examples As a newspaper headline, “Aliens know we are here” could readily give rise to a widespread impression that extraterrestrial Aliens are currently known to exist. After all, if Aliens know we are here, they must exist. There are certainly things in our environment that do know we are here, for example ourselves. In the example sentence, the phrase “know we are here” refers to an observed characteristic, sentience, of an instance of some aspect of our environment, ourselves, and is therefore associated with SC(I). The term “Aliens” however, does not refer to an observable aspect of our environment. “Aliens” refers to a loosely defined concept that currently refers only to other concepts. The term “Aliens”, in the context of this example sentence, is associated with SC(T). The sentence “Aliens know we are here” involves two distinct semantic categories, SC(T) and SC(I). Unlike the “Mathematicians. . .” example above, the two SCs involved in the sentence are not simply mixed together they are semantically fused together. The SC(I) property of “sentience” is being assigned to “Aliens”. “Aliens” is a reference to an SC(T) concept whose environmental existence, let alone properties, has yet to be established. In the “Mathematicians. . .” example the property of “the square root of minus one” was not being assigned to mathematicians or engineers. In this second example the property of “sentience” is being assigned to “Aliens”. The chances of being duped by an assertion involving semantic intertwine, such as “Aliens know we are here”, are greatly reduced when it becomes evident that properties appropriate for one semantic category are being assigned inappropriately to another, entirely independent, semantic category. If and when sentient extraterrestrial are observed, “Aliens” will become associated with SC(I), the overall assertion will become SC(I), and the sentence will no longer involve semantic intertwine. Significant changes in the meaning of a term are not uncommon. The “world out there” has often surprised us with characteristics we have never imagined. The term “observer” is used throughout in the legal sense of “a reasonable person”. Similarly the terms “observed” and “observable” imply observation by a reasonable person. Currently “Aliens know we are here” subtly presents a supposition as a fact. As such it is an example of “Sophistry”: “specious but fallacious reasoning” (OED, 1971). Whilst semantic intertwine can be used to sow confusion, it is also at the foundation of our most beautiful poetry and literature. Metaphors, analogies, fables, even riddles are often dependent upon semantic intertwine for their construction. “You know the sound of two hands clapping. What is the sound of one hand clapping?” is a famous Zen Koan. SCA illustrates the manner of its construction, and thereby provides the blueprint for similar constructions. “Clapping”, as with any term, can be defined in any way one chooses. With respect to its generally assumed use, “Clapping” is an SC(I) concept that refers to the readily observable effect of repeatedly

bringing both hands together sharply: an operation involving two operands; hands. Using this binary operation blueprint, but utilising instead an SC(A) concept, “addition”, one may construct a similar puzzle: “You know the sum of two digits added. What is the sum of one digit added?” SCA pinpoints the source of the Zen conundrum: notions not nature. The manner in which we process information provides the means to conceptualise the world around us and describe our concepts in language. Our processing facilities also enable us to construct additional concepts, which, though formed by reference to the world around us, are fictions. Sometimes we are persuaded that these fictions are facts of nature: “facts” which can also be described in language. “Philosophy is a battle against the bewitchment of our intelligence by means of language” (Wittgenstein, 1953). SCT, it is proposed, now reveals the conceptual and linguistic constructions that underlie such bewitchment and SCA now provides the means to analyse such constructions. 3. Semantic category analysis: overview SCA can be applied to any sentence to determine the SC involved in that sentence and their modes of combination. When a sentence has more than one meaning, as the result of syntactic, lexical or semantic ambiguity, SCA can be applied individually to any one, or all, of the several sentence variants. Expressing our cognitive activity in language produces various forms of ambiguity: Syntactic ambiguity arises from the grammatical construction of a sentence. Parsing the sentence construction enumerates the syntactic alternatives. Lexical ambiguity arises from the multiplicity of definitions associated with individual terms of a sentence. Lexical analysis enumerates these semantic variants. Semantic ambiguity. It is proposed here that a form of semantic ambiguity, termed “semantic category ambiguity”, may arise when a sentence as a whole involves a combination of SC. SCA enumerates these semantic category variants. SCA can be applied to any sentence-variant, a syntactic/lexical/contextual variant of a sentence, to produce the semantic description of that sentence-variant. An SCA semantic description indicates the semantic category, or categories, involved in a sentence-variant, and most importantly whether different SCs have been fused together in semantic intertwine. For example: SCA denotes the newspaper headline Aliens know we are here, for extraterrestrial aliens, by the semantic description “T , [I]”. The tilde character “ , ” is used to denote semantic intertwine between the SC(T) subject Aliens and the SC(I) predicate know we are here. In this example sentence Aliens know we are here, semantic intertwine results from the observable SC(I) property of sentience being assigned to purely conceptual SC(T) entities whose existence, let alone properties, can currently only be conjectured. Semantic intertwine can also occur between a term and a qualifying attribute even when both belong to the same semantic category. The sentence “The clouds were pregnant with rain” involves examples of such “out-of-scope” semantic intertwine. The use of a word is generally constrained by its descriptive characteristics. Figures of speech such as simile and metaphor extend the use of words. In the sentence

Semantic category theory cognition 1325

K 34,9/10

1326

“The clouds were pregnant with rain”, the use of the word “pregnant” is extended figuratively to “clouds” and the use of the word “rain” is extended figuratively to “pregnant”. SCA, when applied to this sentence generates the semantic description “I , (I , I)”, indicating the out-of-scope semantic intertwines between “rain” and “pregnant”, and “pregnant” and “clouds”. Out-of-scope semantic intertwine can also be found between “attributes”, such as adjectives and adverbs, and the terms they qualify, as in: Light thoughts fly heavily. Semantic intertwine is a powerful consequence of our cognitive processes. Without our ability to describe the imagined, our ability to progress, or even to have survived, would have been severely curtailed: moreover we could neither create nor enjoy imaginative poetry and prose. However, the imposition of the imaginary as the unquestionable, has itself curtailed progress. Sentences are readily created which confuse the observable with figments of the imagination: sentences that some readers may erroneously conclude are indisputable: – assumptions untimely adorned in the robes of certainty. SCA provides a means of identifying, describing and resolving many forms of semantic ambiguity: regardless of the structural complexity of the sentences within an argument. Fallacious reasoning, obfuscation, sophistry, or in its modern guise “spin”, are festooned with semantic intertwine. By pinpointing the nature and occurrence of semantic intertwines within sentences, SCA clearly distinguishes between the creative, the feasible and the foisted. Any transcribed variant of a sentence produced by a speech act (Austin, 1962) or by subjective interpretation (Empson, 1930) is amenable to SCA. SCT regards no written or spoken sentence as “meaningless” (contrast Ayer, 1946). There are many different forms of language including: written, spoken, gesture, music, sign, art, body language, mime and (Gostelow, 1976) the eighteenth century “language of the fan”. Expressions in each “form of language” are manifestations of thought. SCA is directly applicable only to written or spoken language; other forms of language need to be transcribed prior to analysis. Every human language describes concepts. Some concepts refer to aspects of the environment: some concepts refer to other concepts. One implication of SCT is that the enduring realism/idealism dispute between the Aristotelian and Platonic views of nature is unnecessary. Descriptions involving only SC(I), or only SC(E) concepts are associated with Aristotelian scientific observations. Descriptions involving only SC(A), or only SC(T) concepts are associated with the assumed “ideal forms” of the “Platonic Realm”. Within the framework of SCT, “ideal forms” do indeed exist, as one outcome of our overall cognitive facilities. Moreover, an SC(T) concept may be assumed to be associated with any chosen set of properties one wishes: it may be assumed to be associated with properties such as “timeless”, “objective”, “independent”, “real” . . . it may even be assumed to be associated with the property of physical existence. Ideal forms exist: as concepts that refer to concepts. A “perfect circle” cannot exist in physical reality; it can however exist in terms of SC(A) definitions in pure mathematics, such as those of Euclid or Descartes, which involve concepts that refer to other concepts. But if within a linked sequence of axiomatic concepts, an additional conceptual reference is made to a physical (necessarily “imperfect”) circle, as in applied

mathematics, engineering or physics, then this latter concept belongs to categories E or I, not A. Distinct preferences for particular viewpoints, such as Platonic idealism or empirical objectivity, might reflect specific “thinking-types”: each associated with a particular form of cognitive referential ability. Individual referential ability could influence career choices: for example avoiding careers which involve abstract mathematics, semantic category A, and lifestyle choices, such as an attraction to mysticism, semantic intertwine T , I. Future research in experimental psychology might consider whether specific “thinking-type” personalities are indeed associated with particular semantic category groupings. Assertions involving semantic intertwine frequently underlie the obfuscating arguments of opposing factions. Using SCT to unravel these semantically intertwined assertions demonstrates that many polarising arguments, within various fields of endeavour, are simply, unnecessary. Some concepts refer to concepts; some concepts refer to aspects of the observable environment. Arguments which fail to distinguish between concept/concept references and concept/observable references are futile, but fervently last for centuries: and will continue to do so. 4. Sentence analysis: overview When we hear or read a sentence, one question is paramount: “What does it mean?” Sentences that we suspect could modify our image of the world, a world that includes ourselves, rapidly attract a further question: “Is it true?” We intuitively utilise linguistic analysis each time we interpret the meaning(s) of a sentence. Questions concerning the veracity of a sentence are naturally left until we have selected the meaning(s) of the sentence we consider pertinent. Any one sentence has many possibly meanings. Fortunately, the context of the sentence often provides numerous clues to assist in narrowing the range of possible meanings to those of immediate concern. Even so, determining the full range of possible meanings of a sentence, even an apparently simply sentence, often requires the assistance of rigorous analytical methods. 4.1 Syntax analysis Fundamental research in syntax theory continues, especially into “fine-grained” taxonomic considerations. Advanced theories of syntax currently under debate include: the semantic interpretations approach (Jackendoff, 1972) the transformational approach (Edmonds, 1976) and the minimalist approach (Chomsky, 1995). From the perspective of SCT, the only requisite of a theory of syntax is its ability to enumerate the complete set of syntactic variants for any given sentence. SCA itself requires access only to the structural units of syntax. Neither the empirically derived theoretical constructs nor the theoretical propositions, the organisational principles of SCT, require access to any additional depth of syntactic/semantic detail. A theory of syntax that lends itself to implementation as an automated parsing procedure, is capable of generating all possible syntactic variants of a sentence and whose syntax descriptions include pointers between qualified terms and their assigned attributes, represents an ideal pre-processing facility for SCA.

Semantic category theory cognition 1327

K 34,9/10

1328

For the purpose of descriptive simplicity, a traditional theory of grammar, using classical descriptors and a “phrase structure” binary tree representation, will be utilised throughout, unless otherwise stated. 4.2 Sentence truth-types In the context of a novel, “Peter and Paul have a plane” has two syntactic variants, depending upon individual or co-ownership, each with two lexical variants depending upon whether plane refers to an aeroplane or a woodworking tool. The sentence has at least four very distinct meanings even when the context of the sentence is reasonably well specified. Ascertaining the veracity of the sentence “Peter and Paul have a plane” is therefore dependent upon first, a syntactic analysis to determine the set of possible grammatical structures, and second, upon a lexical analysis to determine, for each grammatical structure, the set of lexical possibilities. Only when this is achieved can the appropriate measures be taken to begin to determine the veracity of a particular semantic variant of the sentence. Thus far it would appear that the sentence “Peter and Paul have a plane” has at least three distinct ways of being false and one possible way of being true. But suppose that the sentence had appeared as the conclusion of a categorical syllogism quoted within the novel. A categorical syllogism is an argument in which a conclusion follows from several premises. For example: All Cretans are Laistrygonians; All Laistrygonians are liars; and So all Cretans are liars. As a conclusion to a categorical syllogism of this form, the sentence “Peter and Paul have a plane” is “True”. The sentence is “True” regardless of whether Peter and Paul were humans, or rabbits, or whether or not they owned a woodworking tool or an aeroplane. Indeed one could substitute any symbol, or any sequence of symbols, for Peter, Paul, or Plane, such as X, Y and Z, or Cretans, Liars and Men, and the sentence would still be “True”. In contrast, the veracity of a sentence describing the world around us hinges upon observations, not pre-supposed axioms. “Truth” has different guises: under each guise lie different forms of “Truth”. 4.3 Truth and logic Historically, only one, unitary-type of “Truth” had been recognised: an assertion was either “True” or it was “False”. Having only one type of truth associated with any one assertion has been a primary source of confusion that has, over millennia, led to a host of fallacious arguments: some of which are still being taken seriously today. SCT implies four, entirely independent, forms of meaning: each with its own type of truth. Any one sentence may have many sentence variants, some of which belong to one semantic category, others to an entirely different semantic category with a different form of truth. The same set of words can be both true and false, but now we can be precise as to which form of truth belongs to which interpretation of that “same set of words”. With SCT, the task of those wishing to disseminate a fallacious argument is now far more difficult. The recipient can now ask “To which SC do the assumptions belong?”

and can thereby identify fundamental errors of reasoning in what may appear an incontrovertible argument. Assumptions or conclusions involving different SCs, especially when involving semantic intertwine, rapidly highlight the source of error in complex and sophisticated reasoning. In the first decades of the nineteenth century the Polish Logician Jan Łukasiewicz developed many-valued logic, which involves a spectrum of truth-values rather than simply the exclusive binary values true and false. He also envisaged the necessity for more than one form of truth (Łukasiewicz, 1970). However, prior to SCT, there has not been a well-defined set of SC upon which to base a “many-forms” approach to truth. It should be noted that the actual choice of truth-value representation is outside the scope of SCT. SCT indicates that there is an independent form of truth associated with each semantic category and the type of investigation appropriate for each semantic category. SCT does not indicate the actual truth-value of an assertion nor how that truth-value should be expressed. SCT can utilise any contemporary developments in the field of Logic (Beall and van Fraassen, 2003), or any subsequent developments in this field; developments that might include the many-valued approach of Jan Łukasiewicz being applied to each of SCT’s four SCs. 4.4 Classical views of truth For centuries there has been a dispute as to which is better: (1) The correspondence theory of truth or (2) the coherence theory of truth. In SCT terms this dispute is, in essence, equivalent to asking, which is better – an apple or a prime number?

(1) The correspondence theory of truth, reflecting the correspondence between the representation and what it represents, was originally formulated by Aristotle: “to say of what is that it is, or of what is not that it is not, is true”. (2) The coherence theory of truth, originally proposed by metaphysicians such as Leibniz, Spinoza and Hegel, asserts that the truth of a proposition consists in it being a member of some suitably defined body of other propositions. For SCT, the dispute between these theories is unwarranted: neither is “better”. Primarily, they each address entirely different types of meaning: SC(E) and SC(I) for the correspondence theory and SC(A) and SC(T) for the coherence theory. Previous theories of meaning and truth have attempted, metaphorically speaking, to “prise understanding with a single tool”. Here it is suggested that four tools are required: each of dedicated use. The single-tool, “axiomatic”, approach is insufficient. Indeed two of the requisite tools specified here actually require us to leave our armchair and examine the world around us. However, such is the established investment in the “single-tool” approach, that SCT’s four-category “multi-tool” approach to the nature of meaning and truth, will certainly encounter some resistance: at least until the SCA “toolbox” is used.

Semantic category theory cognition 1329

K 34,9/10

1330

4.5 Sentence meaning-types Following syntactical and lexical analysis, the sentence “Peter and Paul have a plane” is shown to have at least four distinct possible meanings even when the terms in the sentence are assumed to have their most common meaning in the context of a novel. Extend the lexical range of the terms used and the number of possible sentence meanings increases significantly. In the context of a children’s story, Paul might refer to the anthropomorphic English-speaking rabbit friend of Peter, or in a science fiction novel, plane might refer to “a machine that can traverse inter-galactic hyperspace”. Not only has the lexical range increased but also the very nature of the sentence meanings has burgeoned. In the context of a novel, the sentence describes a perfectly feasible situation, the ownership of an aeroplane for example. In the science fiction context, the same set of words no longer describes a feasible situation: the meaning-type is different. Sentences can readily be created that involve a fusion of feasibility and fantasy: part persuasive reality, part concealed fantasy, wholly deceptive. SCA provides the means of enumerating the entire range of meaning-types of a sentence. Most sentences involving a single semantic category, or a mixture of two or more SC, provide little difficulty in interpretation. However sentences involving semantic intertwine, a fusion of SC, have been highly deceptive. Were such conceptually fused sentences always innocuous, it would matter little. Unfortunately some intertwined assertions, when taken seriously, have press-ganged history “bewitchingly” into numerous dark ages. Semantic intertwine has significant bearing upon the ways in which our cognition deals with, and has dealt with, information. Attitudes are based on our perceived accuracy of information. Examples of semantic intertwine vary from the malignantly persuasive, some of which are still being used to justify internecine conflict, to the pleasingly innocuous: such as the conundrum “This sentence is false”.

5. Semantic category theory SCT will be presented here together with explanatory notes. Subsequent sections will describe the associated analytical procedure, SCA, illustrated by a worked example in Section 6.2. A simplified “first approximation” approach to SCA is described in Section 6.4. Applications and implications of SCT are discussed in Sections 7-9. Theoretical constructs of SCT C1: “semantic categories” C2: “category truth-types” Theoretical proposition of SCT P1: “semantic markers” P2: “semantic marker resolution” 5.1 Theoretical construct C1: semantic categories In our use of spoken and written language, SCs E, I, A, and T relate to descriptions of: . E – Objectively observed extant physical entities and their objectively observed physical characteristics.

.

.

.

I – Instances of objectively observed physical entities and their objectively observed characteristics. A – Conceptual entities whose assigned characteristics are constrained by imposed definitions. T – Conceptual entities whose assigned characteristics are unconstrained, either by observation or consistent definition.

5.2 Theoretical construct C2: category truth-types In SCT, a specific truth-type is associated with each of the four SCs: E, I, A and T. 5.2.1 Category truth-types: examples. The conclusion of a categorical syllogism, for example the SC(A) assertion “All Cretans are liars”, of the form “All x are y”, is defined to be “True”. The SC(I) assertion “All Cretans are Liars” may be refuted by observation and shown to be “False”. The “same set of words” can thus be both “True” and “False”. Arguments that invoke such ambiguity can be very persuasive, and misleading: a form of argument prevalent in paradox and sophistry. Stating that a sentence is True, or that a sentence is False, has to be qualified, according to SCT, by the semantic category of that sentence. . “All Cretans are liars” can be stated to be “SC(A) True”. . “All Cretans are liars” can be stated to be “SC(A) False”. Indeed any statement can be said to be True by definition, or False by definition. Upon the discovery of one truthful Cretan, the utterance “All Cretans are liars” would be SC(E) False. In a report of that discovery, the statement “All Cretans are liars” is SC(I) False. In the case of an SC(T) sentence, such as the mythological assertion “Jason found the Golden Fleece”, the only criterion for truth-value assignment is that of textual consistency. To summarise; according to SCT, each semantic category has its own, independent truth-type: . SC(A) – True, or False: by definition. . SC(E) – True, or False: dependent upon immediate observation. . SC(I) – True with respect to supporting evidence; falsified by contrary evidence. . SC(T) – True by textual consistency; falsified by textual inconsistency. These truth-types are very different, not least in the means, if available, for establishing their actual value, if possible. Establishing the actual truth-value of a sentence is outside the scope of SCT. The choice of actual truth-value representation, whether binary or probabilistic for example, is also outside the scope of SCT. SCT establishes: . the semantic category, or categories, involved in a sentence-variant, . the SC, if any, involved in semantic intertwine; and . the truth-type(s) involved in the sentence-variant.

Semantic category theory cognition 1331

K 34,9/10

1332

SCT does not provide the means for establishing whether a sentence is false. However SCT does indicate the appropriate means for establishing the likelihood of a sentence being false: “by definition” for SC(A), by observation for SC(E) and SC(I), or by performing a textual consistency check for SC(T). 5.3 Theoretical proposition P1: semantic markers The SCA of a chosen sentence-variant involves three steps: STEP 1. Represent the chosen sentence-variant as a bracketed, binary syntax expression. The nodes of the expression correspond to the elemental “tokens” of the sentencevariant, and the node groupings correspond to the syntactic components of the sentence-variant. For example: (Peter (and Paul (have a plane))) has six tokens and includes three syntactic groups. STEP 2. The second step invokes theoretical proposition P1 of SCT, “semantic markers”, and is discussed in Section 5.3.1. In this second step, each “token” of the sentence is replaced by one of seven “semantic markers” from the set: [e i a t n: *] An out-of-scope attribute token from the semantic marker subset [e i a t] is prefixed by the tilde character “ , ”. STEP 3. The third step of SCA invokes theoretical proposition P2 of SCT, “semantic marker resolution”, and generates the semantic description of the sentence-variant. This procedure is described later in Section 5.4. The term “token”, rather than “word”, is used to emphasise that each word in a sentence is strictly associated with (1) the sentential phrase, (2) the complete sentence, (3) the context of the sentence and (4) any sentences and texts referred to in (2) and (3). The distinction between “token” and “word” is retained during the SCA procedure by the distinction between the elements of the semantic marker subset [e i a t] and those of the semantic category set [E I A T] into which the subset elements resolve. 5.3.1 Semantic marker assignment. In SCT, each token of a sentence-variant performs roles I, II or III: . Role I – a reference to a concept . Role II – a reference to an aspect of the environment . Role III – a functional reference to another token or syntactic group Roles I and II involve descriptive tokens, for example nouns, verbs, adjectives, adverbs, and prepositions. Role III involves functional tokens, for example verbs, prepositions particles, auxiliaries, determiners, pronouns, conjunctions and complementizers.

In STEP 2 of SCA, semantic markers [e i a t n: *] replace tokens as follows: Role I . “a” replaces a token referencing an aspect of a definition set involving axioms and operations upon those axioms. . “t” replaces a token referencing that which is defined constraint-free. . (See Attributes below) Role II . “e” replaces a token referencing that which is objectively and currently observable. . “i” replaces a token referencing an instance of that which is objectively observable. . (See Attributes below) Attributes. The semantic marker of an “attribute”, for example an adjective or an adverb, is prefixed by a tilde “ , ” when the attribute is “out-of scope”. An attribute is said to be out-of-scope when the characteristic it assigns to a term is incompatible with the nature of the term itself[1]. Role III . “n” replaces a neutral token such as a determiner. . “:” replaces constructional elements including linguistic conjunctions such as “and”, “but”, and “or”. . “ *” replaces an Isolating Semantic Modifier (see below) Neutral tokens. A neutral token is one that appears in the string of sentence tokens without contributing to the classification of the SC involved in that sentence. Neutral tokens play an important syntactic role in the construction of a sentence but do not have a role in determining semantic category membership. For example, the semantic description of the assertion “The table is round” is not affected by the token “The”. The token “table” is significant to the semantic description: the token “the” is not. In the English language, neutral tokens include the definite article “the”, the indefinite article “a” and, when used as complementizers, “that”, “for” and “to”. Conjunctions. A linguistic Conjunction such as “or”, “and” or “but”, is a constructional element of a sentence with particular relevance in multi-subject and multi-predicate sentences. Whilst conjunctions are, with respect to SCT, also neutral tokens, they are represented by a colon “:“ during SCA. The colon is used as a marker to retain the constructional relationships involved in the sentence under analysis. Isolating semantic modifiers. Words and phrases that semantically “isolate” a predicate from a subject are termed, in SCT, “isolating semantic modifiers” (ISMs). Examples include words and phrases that, regardless of tense, associate the subject with a linguistic sub-unit “. . .” within the sentence, such as an exclamation or proposition: “said. . .”, “sometimes says. . .”, “screamed. . .”, “yelled. . .”, “believes . . . ”, “believed that . . . ”, “thinks . . . ”, “thinks that . . . ”, “is of the opinion that . . . ”, “dreams . . .”.

Semantic category theory cognition 1333

K 34,9/10

1334

Semantic modifiers are initially replaced by the semantic marker “ *” which gives rise to semantic category isolating parentheses “[ ]” during SCA. Isolating semantic modifiers: examples. “The value of Pi is 3.15” is an SC(A) assertion of the form “x is defined to be y”. “Richard believes that the value of Pi is 3.15”, in the context of an extant conversation involving Richard, is an SC(E) assertion which includes an SC(A) sub-assertion. The truth-value of the assertion is dependent upon whether Richard actually believes that “the value of Pi is 3.15”, not upon whether “the value of Pi is 3.15”. In SCT terms: The semantic category of the overall assertion, SC(E), is not affected by the semantic category of the semantically isolated sub-assertion, “the value of Pi is 3.15”, of semantic category SC(A), contained within it. The same would apply if the ISM “believes” were replaced by “thinks”, “says”, “is of the opinion”, “holds the view” or any other verb, or its complement, which refers to a linguistic sub-unit within the sentence. To summarise the characteristics of ISMs: . The semantic category of an enveloping sentence or linguistic unit is unaffected by the semantic category of a linguistic unit semantically isolated within it. . An ISM such as “says. . .” does not affect the semantic category of its associated linguistic unit, regardless of its tense: “said . . . ” “will say. . .”. . A semantically isolated linguistic unit has its own semantic category membership(s) and is evaluated independently. 5.4 Theoretical proposition P2: semantic marker resolution Resolving semantic markers into progressively larger syntactic/semantic groups, results in a semantic category description of the overall sentence. The syntactic/semantic group transforms are achieved with empirically derived “resolution rules”. The initial step of SCA is to form a binary syntax tree of the requisite contextual/syntactical/lexical variant of the sentence being analysed. A binary syntax tree may be envisaged as a bracketed expression or alternatively, as in the following discussion, as the equivalent tree diagram. Each parent node of a binary tree has two child nodes. Each terminal child node of an SCA binary syntax tree contains one word from the word string of the sample sentence-variant. In SCT, a word is termed a “token” to indicate its semantic dependency on the remainder of the sentence and the context of the sentence. Parent nodes, and parent node composites, constitute the syntactical structures within the sentence. In STEP 2 of SCA, each terminal child node is assigned a semantic marker. SCA proceeds by resolving child node semantic marker pairs to parent node semantic markers. This progressive transformation is termed “semantic marker resolution” and is performed in accordance with SCA “resolution rules”. SCT, SCA and SCA resolution rules, are not related to any other works. The author derived SCT and SCA from observing the ways in which language is used and constructed. In contrast, the author derived SCA resolution rules, which in themselves have no theoretical basis, by “means-ends analysis”: creating and refining the rules until they were appropriate for their required task.

5.4.1 Resolution rules. The SCA resolution rules introduced here, labelled rr0 to rr7 are applicable to a wide range of grammatical structures in the English language. A totally comprehensive treatment of all aspects of English grammar will almost certainly require additions, and possibly modifications, to this rule set. Languages other than English will require the empirical derivation of their own resolution rules, semantic markers and modes of assignment. The details of theoretical propositions P1 and P2, the organisational principals of SCT, are language specific. It is emphasised however, that the theoretical constructs of SCT, C1 “semantic categories” and C2 “category truth types” are language independent. Aspects of English grammar covered by the rule set below include: subjects, predicates, adjectives, nouns, verbs, adverbs, conjunctions, prepositions, articles and complements. Following semantic marker assignment, each terminal node of the binary syntax tree parse of the sentence-variant under analysis contains a semantic marker from the set [e i a t n *:] Members of the subset [e i a t] are termed “semantic category markers” and these are replaced during resolution by “semantic category designators” from the set [E I A T] respectively. The distinction between “semantic category markers” and “semantic category designators” is made in order to retain the distinction between “token” and “word”. Resolution rule set: . Where sc and SC, whether or not prefixed by tilde “ , ”, are semantic category marker expressions and semantic category designator expressions, respectively. An ellipsis “ . . . ” represents any semantic category expression. . rr0 Conjunctions (: (sc)) and (: (SC)) resolve to (: SC) . rr1 Neutral tokens (n(sc)) and (n(SC)) resolve to (SC) . rr2 Semantic category intertwine (sc1(sc2))for different sc marker or SC designator expressions sc1 and sc2, resolves to (SC1 , SC2) . rr3 Isolating semantic markers * (sc) and *(SC) resolve to [SC] . rr4 ISM intertwine (sc1(,[. . .])) resolves to (SC1 , [. . .]) . rr5 Semantic prefix resolution (sc(. . .)) resolves to (SC(. . .)) . rr6 Sequence resolution (SC1(SC1)) resolves to (SC1) . rr7 Nesting “Level” “Depth” resolution ((. . .)) resolves to (. . .) and ([. . .]) resolves . to [. . .] for all nesting brackets of the same depth. Applying these resolution rules, in the sequence rr0 to rr7, iteratively from child node level to root level of a binary syntax tree representation of a sentence-variant, generates the “Semantic Description” of that sentence-variant.

5.5 Semantic descriptions A sentence is composed of one or more SC. A sentence or phrase involving only a single SC can be denoted by the contracted semantic description: A, I, E or T. A sentence or clause involving:

Semantic category theory cognition 1335

K 34,9/10

.

.

.

1336

An ISM is denoted by the SC of the subject, followed by the SC of the predicate enclosed in parentheses “[ ]”. Semantic intertwine is denoted by “SC1 , SC2” where SC1 and SC2 are different SC and “ , ” the tilde character. Non-intertwined SC in conjunction are separated by a colon “:”.

The level of detail expressed by a semantic description can be chosen to suite analytical requirements. Resolution rules rr0 to rr7 generate a semantic description of sentence-variants that indicates the SC involved and their inter-relationships. Not invoking rr7 increases description complexity by retaining bracketed “( )” syntactic group relationships within the sentence-variant. 5.5.1 Semantic description examples. The following examples invoke rr0 to rr7 and, as elsewhere, assume the common use of words (Table I). 6. Semantic category analysis The SCA of a syntactically and lexically disambiguated sentence produces a semantic description which denotes the interrelationships of the SC associated with each contextually dependent, syntactically structured, semantic component. 6.1 The semantic category analysis procedure For any given sentence: For any given context: Enumerate all syntactic variants of the sentence: For each syntactic variant enumerate all lexical variants: For each resulting sentence-variant under consideration, perform STEPS 1, 2 and 3: STEP 1 Form a binary tree representation of the sentence-variant syntax. STEP 2 Assign the contextually appropriate Semantic Marker to each terminal node.

Table I.

Sentence

Context

Semantic description

David believes that Aliens know we are here and that Pi is 3.14 Aliens know we are here This sentence is false

Sentence from an autobiography

I [T , [I] : A]

Newspaper headline When an axiom of the form “X is Y” When attributing a characteristic to an observable entity From an essay Upon being introduced to two people Sentence from a crime novel

T , [I] A I

The clouds were pregnant with rain They both speak French The acid turns the metal red

I , (I , I) E I

STEP 3 At each node depth, from terminal level to root, apply semantic marker Resolution Rules to each child node pair and assign the result to their parent node. The resulting expression is the “semantic description” of the sentence-variant. 6.2 Semantic category analysis: worked example A data sample for SCA is a “sentence-variant”: a sentence resolved of contextual, syntactical and lexical ambiguity. As required by SCT, the context and syntax of the sentence-variant under analysis is specified. SCA is applicable to any lexical, or idiosyncratic, use of a term within a sentence-variant: provided reference to the definition, or the definition itself is quoted. Each different use of any one term within a sentence produces a different sentence-variant requiring its own separate analysis. It should be noted that the semantic marker assignment procedure, STEP 2, is sentence-variant specific. Each possible semantic marker assignment to any one term reflects a different meaning of the sentence and corresponds to a different sentence-variant. In the sentence-variant used in the following example, the common use of terms is assumed unless otherwise stated. Example SV1. Context: A meeting at a Star Trek convention where a friend introduces you to David and Mark and says:David and Mark think Aliens are sentient but only David likes Star Trek.

This is a multi-subject, multi-predicate sentence whose semantic description, for the specified context/syntactic/lexical variant, will be shown to be: E : E [T , I] : E [I] SCA derives this semantic description in three steps: STEP 1: Parsing. Parse the sentence-variant in the form of a bracketed binary syntax tree expression. In the case of Example SV1 this will produce: (David(and(Mark(think(Aliens(are(sentient))))))(but(only(David(likes(Star Trek)))))) Notes on Format: (1) Bracket-pairs “( )” delineate the grammatical components for both the sentence-variant and the semantic description of the sentence-variant. (2) Outermost bracket-pairs “( )” are optional: they do however act as convenient separators between the individual semantic descriptions associated with a multi-sentence text. (3) Space characters have no descriptive significance and can be freely incorporated for readability. STEP 2: Semantic marker assignment. Each token of the sentence-variant parse string, “David” “are” “Mark” “likes” . . . in this example, is replaced in accordance with the criteria listed below, by one of the seven “semantic markers” from the set [e i a t n: *] The order in which parse string tokens are replaced is immaterial. The resulting character string is termed “the marker expression”.

Semantic category theory cognition 1337

K 34,9/10

1338

For the purpose of brevity in the following description, the notation “ U ” will be used to represent “is replaced by”. . A token referencing that which is currently objectively observable U “e” . A token referencing an instance of that which is objectively observable U “i” . A token referencing the axioms and associated operations of a definition set U “a” . A token referencing that which is defined constraint-free U “t” . A token that does not distinguish semantic category membership, a “neutral token” U “n” . A token that performs a structural role in a sentence U “:” . An isolating semantic modifier token U “ *” The bracketed binary syntax tree expression for the example sentence-variant SV1 is: ðDavidðandðMarkðthinkðAliensðareðsentientÞÞÞÞÞÞðbutðonlyðDavidðlikesðStarTrekÞÞÞÞÞÞ The seven semantic marker assignment criteria can be applied in any order. For descriptive convenience, assignments are made here in the order in which the criteria are listed above. A token referencing that which is currently objectively observable U “e” “David” and “Mark” are directly observable as the statement is being made. “David” U “e” “Mark” U “e” (e(and(e(think(Aliens(are(sentient))))))(but(only(e(likes(Star Trek)))))) A token referencing an instance of that which is objectively observable U “i” “sentient” is a reference to an example of an objectively observable characteristic which here is being attributed to an, as yet, unobserved entity. Here “sentient” is out-of-scope, and its semantic marker is prefixed by a tilde: “sentient” U “ , i” “Star Trek” is a reference to an example of an observable television program. “Star Trek” U “i” ðeðandðeðthinkðAliensðareð, iÞÞÞÞÞÞðbutðonlyðeðlikesðiÞÞÞÞÞÞ A token referencing the axioms and associated operations of a definition set U “a” There are no tokens in this particular sentence-variant that refer to a consistently defined axiom set. A token referencing that which is defined constraint-free U “t” “Aliens” is a reference to a loosely defined concept, and not, as yet, a reference to observable entities. “Aliens” U “t” ðeðandðethinkðt ðareð, iÞÞÞÞÞÞðbutðonlyðeðlikesðiÞÞÞÞÞÞ A token that does not distinguish semantic category membership, a “neutral token” U “n”

The terms “are” and “only” could appear in any sentence-variant without indicating the SC involved in that sentence-variant. “are” U “n”

“only” U “n”

ðeðandðeðthinkðt ðnð, iÞÞÞÞÞÞðbutðnðeðlikesðiÞÞÞÞÞÞ A token that performs a structural role in a sentence U “:” The linguistic conjunctions “and” and “but” each perform such a role in this sentence-variant. “and” U “ : ”

“but” U “ : ”

ðeð: ðeðthinkðt ðnð, iÞÞÞÞÞÞð: ðnðeðlikesðiÞÞÞÞÞÞ An ISM token U “ * ” “think” U “ * ” “likes” U “ * ” (e(:(e( *(t(n( , i))))))(:(n(e( *(i)))))) Having applied all seven replacement criteria to the parse string for the sentence-variant under analysis, the resulting marker expression: ðeð: ðeð* ðt ðnð, iÞÞÞÞÞÞð: ðnðeð* ðiÞÞÞÞÞÞ is now resolved to a semantic description by STEP 3. STEP 3: Resolution. As an aide-me´moire the bracketed binary syntax tree for SV1 following STEP 1 was: ðDavidðandðMarkðthinkðAliensðareðsentientÞÞÞÞÞÞðbutðonlyðDavidðlikesðStarTrekÞÞÞÞÞÞ

Syntax delineating brackets “( )” are numbered underneath as an explanatory convenience to indicate the node-depth level of the associated binary syntax tree. Resolution details: Resolution rules rr0 – rr7 are applied in sequence at each node depth level from terminal depth 6 in this example, to root level, level 1.

Semantic category theory cognition 1339

K 34,9/10

1340

Adjusting optional formatting characters gives: E : E½T , I : E½I The semantic description for example sentence-variant SV1 David and Mark think Aliens are sentient but only David likes Star Trek is therefore: E : E½T , I : E½I which indicates that SV1 involves semantic categories SC(E), SC(T) and SC(I), and a semantic intertwine between SC T and I. The effect of context: Were this sentence from a report on the Star Trek convention the semantic description would be I: I [T , I]: I [I] since “Mark” and “ David” would then be references to instances of people, not references to people currently present when the sentence was spoken.

6.3 Algorithmic semantic category analysis The full rigor of formal SCA is required in order to produce a precise semantic description of a sentence-variant in terms of the SC involved and their interrelationships. However, SCA need not be as onerous as the above worked example would suggest: . For STEP 1 of SCT, automatic parsers and electronic dictionaries can be used to provide the syntactic/lexical variants of a sentence. . For STEP 2, the choice of sentence-variants for analysis and the assignment of semantic markers can be performed by inspection. . STEP 3 of SCA, semantic resolution, lends itself quiet readily to implementation as a nested computer algorithm. Without computer assistance, the SCA of a complex sentence can be daunting. The next section however describes a far easier means of achieving “a first approximation” to the semantic description of the vast majority of sentences. 6.4 Heuristic semantic category analysis There is a very simple method of SCA available for everyday use. This “broad brush” approach can be used to provide an initial overview of the SC involved in a sentence-variant and whether the sentence-variant involves semantic intertwine. This simplified form of analysis is entitled “heuristic semantic category analysis”: the term “heuristic” indicating “rule of thumb”. For the vast majority of sentences, the heuristic form of SCA can be performed mentally as a sentence is heard or read. Context is often sufficiently evident to reduce significantly the number of lexical variants: syntactic variants are also often apparent. The Heuristic Semantic Category Analysis procedure :

Assign the appropriate semantic category designator; A; T ; I ; or E to the subject; and to the predicate; in each subject – predicate pair: For example: Aliens know we are here for extraterrestrial Aliens. (Aliens (know we are here)) (Subject (Predicate)) The semantic category of the subject, “Aliens”, is currently “T”: purely conceptual. “Aliens” have yet to be observed. The semantic category of the predicate, “know we are here” is “I” since it only involves references to examples of observable entities or phenomena. The sentence Aliens know we are here assigns SC(I) properties to an SC(T) concept and therefore involves a “T , I Semantic Intertwine”: the unobservable is being assigned observable characteristics. Heuristic semantic analysis can often be performed quite rapidly. With practice this “first approximation” technique helps one develop an intuitive “incongruity detector”, significantly reducing susceptibility to bogus arguments: antique or contemporary.

Semantic category theory cognition 1341

K 34,9/10

1342

7. Applications of SCT SCT is applicable to any human endeavour involving language: a few topical applications are briefly outlined below. 7.1 Machine translation Dictionary look-up does not produce accurate translations: except by those who know the contextual significance of a term, phrase or sentence in both the source and the translated language. The SCA of a text provides the semantic descriptions of sentences within the text. For a translation to be accurate, the semantic descriptions of source and translation should correspond: otherwise the source and translation “are talking about different things”. Bilingual dictionaries could incorporate a semantic category flag for each definition variant of a term. For example: definitions of the word “plane” would be flagged “t” for metaphysically related uses, as in the phrase “the astral plane”, flagged “a” for mathematically related uses as in “plane geometry”, and flagged “i” when referring to woodworking tools and their use. A “semantic analyser” in the form of a computer program could then be used to help improve current machine translation by highlighting those areas of translated text whose semantic descriptions differ from those of the source text. 7.2 Natural language processing: a semantic analyser Suppose there exists: (1) An electronic dictionary in which each lexical variant of each entry has been flagged with the appropriate semantic marker(s) from the set [e i a t n: *]. (2) A natural language parser capable of generating binary tree syntax descriptions that include pointers between qualified terms and their attributes. (3) An algorithm implementing semantic marker resolution rules rr0 to rr7. Supplying text to a device that incorporated (1), (2) and (3) would result in a profusion of semantic descriptions: one for each sentence-variant of each sentence within the text. Many sentences will involve unambiguous terms that are repeated throughout the text. If a contextually appropriate choice of semantic markers were made by human inspection for a small number of sentences after each re-run, the semantic description list generated would begin to approach an accurate semantic description of the whole text. When we ask someone to explain the meaning of a sentence we perform a similar procedure; asking a sequence of questions that reduce ambiguity and ascertain the precise contextual meaning of the terms involved: we “progress by enquiry”. 7.3 Machine intelligence and cybernetics The term Zetetic, from the Greek Zetetikos “progress by enquiry”, has been used to describe a form of learning in which the subject: “learns to perform a task by observing the ways in which that which already performs the task, performs the task”. This “heuristic acquisition” procedure was identified and formally expressed following the results of cognitive experiments (St Quinton, 1982). Human subjects learnt to perform a board game task in conjunction with an experimental control:

a computer already programmed to perform the task. The chosen task, “Go-Moku”, whose rules are easy to learn, is rich in readily describable task heuristics: heuristics that can be deduced from end position backtrack analysis and look-ahead hypothesis testing. An off-line analytical program was devised to determine the heuristics that had been acquired by human subjects during each experimental session. Were an associative memory available, to enable this analytical program to run in real time, the program could learn to perform the task by “observing” the ways in which another participant, such as an experienced human, performed the task. The semantic analyser, mentioned above, in conjunction with a “word-usage database” that records the semantic-marker/word associations made by human assistants during the cooperative analysis of texts, represents an additional approach to natural language acquisition by computers (St Quinton, 2003); computers could learn, by observation, the various ways in which a word is used: as we do. 7.4 The resolution of paradox “A paradox arises when a set of apparently incontrovertible premises gives unacceptable or contradictory conclusions. . .until one is solved it shows that there is something about our reasonings and our concepts that we do not understand” Blackburn (1996). SCT provides a method for solving many forms of paradox and might thereby increase our understanding of concepts and cognition. A single truth-value type cannot, according to SCT, be assigned to an intertwined sentence: each semantic category has its own distinct form of truth type. Concluding that an intertwined sentence is “True”, when focusing upon one of the SC involved, can result in the contradiction that the sentence is “False” when focusing upon the other semantic category. Semantic intertwine frequently lies at the heart of a paradox. 7.4.1 “A tilde I” paradox. One family of paradox in this class involve a semantic intertwine between an SC(A) “two-valued Exclusive OR condition” and a multi-valued or continuous SC(I) phenomenon. For example, Zeno’s Flying Arrow Paradox: “either it is where it is, or it is where it is not”, from which it would appear that either the flying arrow is stationary it is where it is, or in the impossible situation of being where it is where it is not. Therefore, the argument goes, “motion” is just an illusion and the flying arrow must in fact stationary: persuasion over perception. A further family of paradox intertwines SC(A) mathematical induction with vague SC(I) terms such as “herd”, “flock” or “heap”. For example: the Sorites “Heap of Sand” paradox. One grain of sand does not make a heap. And for any number “n”, if “n” grains of sand are not a heap, then one more grain does not make them a heap. It would appear that a heap of sand could not possibly exist. This type of paradox is a consequence of an A , I intertwine involving SC(A) induction with the SC(I) collective noun “heap” whose minimal quantity is unspecified. The argument invokes the defined logical procedure of “Induction”, but without a specified initial value, Induction is undefined. As with other undefined conditions invoked to describe nature, such as “isolated singularities” or “division by zero”, almost anything can be said about that which cannot be known.

Semantic category theory cognition 1343

K 34,9/10

1344

Laconic paradoxes often involve self-reference and usually require elaboration before becoming apparent: examples can be found in the class of “liar paradox”. A “liar paradox” is a paradox generated by a sentence which, actually or apparently, either directly or indirectly, asserts its own falsity. For example: This sentence is false. If the sentence “This sentence is false” is false, then “This sentence is false” must be true. If the sentence “This sentence is false” is true, then “This sentence is false” must be false. Until we assign a different truth-value to the overall sentence there is no paradox. As an SC(A) sentence-variant of the form “X is Y”, “This sentence is false” is not a paradox but an axiomatic statement. As an SC(I) sentence-variant of the form “This cat is black”, “This sentence is false” is not a paradox but a sentence-variant which attributes a valid characteristic to an observable entity: falsity to a sentence. Only when we assign a truth-value to the overall sentence “This sentence is false” does it become possible to generate a paradox. The paradox is generated when the sentence is elaborated: (1) 1st If the sentence “This sentence is false” is false, then “This sentence is false” must be true. (2) 2nd If the sentence “This sentence is false” is true, then “This sentence is false” must be false. . . .A sequence pair that can be repeated ad infinitum. . . Each elaboration, 1st 2nd 3rd 4th 5th . . . , etc. invokes a truth-value attributed within an SC(I) sentence-variant to modify the truth-value of an overall SC(A) sentence-variant. The paradox is generated by the production, at each elaboration, of an A , I semantic intertwine. The semantic intertwine inherent in the elaboration of this self-referential example, “This sentence is false” accounts for its paradoxical nature: as it does in the conundrum “Is this a question?” When interpreting sentences that involve semantic intertwine, there are occasions when cognition could be said to “switch”: alternately focusing upon the different SC involved. Such “linguistic illusion” bears comparison to the perceptual switching that occurs with certain visual illusions such as the reversible Necker Cube or Faces-Vase illusion. Semantic switching also occurs as a persuasive sophist technique in oratory. A premise that appears sensible, with respect to the implied definitions of its terms, is used as the basis of an argument. The subsequent conclusion, apparently valid, is then explained in a manner that implies different definitions to those originally implied. The audience is expected to accept an untenable conclusion on the basis of a perfectly reasonable premise. Should the conclusion be questioned, the orator responds with answers that alternate between the different implied definitions of one or more of the terms. The orator generates semantic intertwine “dynamically” by switching between definitions pertinent only to different SC. 7.4.2 “A tilde A” paradox. This form of paradox involves independent, “orthogonal”, and sometimes contradictory axiom sets. One example of an “A tilde A” paradox is Russell’s “set of all sets”: if the set of all sets does contain itself then it does not contain itself, and vice versa. Some sets are not members of themselves. For example: the set of all bicycles. Other sets, such as the set

of all things that are not bicycles, are members of themselves (the set itself is not a bicycle). Consider the set S whose members are those sets that are not members of themselves. Is S a member of S? If it is, then it is not, and if it is not, then it is. Russell (1903) sought to avoid this difficulty by a “theory of types”, namely logical types, such that a set containing sets of the same type as members was disallowed. SCT suggests a less “ad hoc” solution. In terms of SCT, the “set of all sets” paradox occurs as the result of an A , A semantic intertwine involving contradictory axiom sets: one of which, associated with “the set of all sets” involves infinity – “1” Axiom Set 1:- for which a large number þ 1 ¼ an even larger number 2 is TRUE a large number þ 1 ¼ the original large number 2 is FALSE Axiom Set 2:- for which 1 þ 1 ¼ 1 2 is TRUE Some interpretations of quantum physics invoke a modification of Zeno’s paradox involving contradictory semantic category A axiom set intertwines. Consider: “P þ Q ¼ X where X ¼ 1 and where both P and Q are non-zero positive integers”. There is no solution to this relation using the formal axiom systems of mathematics. Any solution to this relation would depend upon an axiom/operation set contradicting those currently specified by any branch of mathematics. However, this relation is analogous to the assertion that a single entity can occupy two, or with P, Q, R. . . even more, distinct states simultaneously. This relation has been discussed with respect to “complementarity” (Antonopoulous, 1997) in quantum physics. Examining some contemporary assertions of theoretical physics, in the light of SCT, may prove revealing. Quantum physics is not a theory as such but a principle. Combine classical Newtonian mechanics with the quantum principle for example, and the result is called quantum mechanics (Zee, 2001). Quantum electrodynamics alone is of inestimable significance at the level of atomic physics and above. However, some contemporary interpretations of quantum physics (Gribbin, 1998) invoke entities and phenomena, which are said to exist and have the property of being unobservable. The concept of “Leprechauns”, or “Hobgoblins”, whether robed in mathematical nomenclature or not, could form the basis of a conviction with the same characteristics. When no criteria for distinguishing between conflicting ideas exist, disputes are endless: of what cannot be known, much can be said. 7.5 The analysis of sophistry Arguments involving semantic intertwine, whether paradoxical or not, can rage for centuries. “Sophistry is reasoning which is plausible, fallacious and dishonest” (PDP, 2000). Very sophisticated arguments can be generated to give an idea based upon one semantic category the deceptive appearance of possessing properties belonging only to a distinct semantic category. Few would interpret the statement “Leprechauns are

Semantic category theory cognition 1345

K 34,9/10

1346

wealthy” literally. Engulf this statement by complex obfuscating authoritative argument and the numbers may well increase. SCA can be applied to resolve persuasive, though erroneous, arguments, however sophisticated, by indicating the semantic confusions involved. Sophistry in literature generally involves “T tilde I” intertwine: where, for example, imaginary creatures, claimed to actually exist, are assumed to have human characteristics. When anthropomorphic belief in such creatures ceases, the associated texts are called “mythology” and the belief “pagan”. Sophistry in science generally involves “A tilde I” intertwine: in which purely conceptual entities or phenomena, described in the languages of mathematics, are stated authoritatively to physically exist and to posses defined characteristics: frequently one of which is the property of not being observable. Doubters are usually referred to the elegant mathematical proofs. Anything can be said about anything that is stated to be, or is, unobservable: including that it exists, perhaps “in infinite space” or “with finite probability”. A concept expressed in mathematical symbols, together with a few complex operations upon those symbols, can be remarkably persuasive. And pure mathematics has one fantastic advantage over the experimental sciences: Pure mathematics does not have to justify its axioms. 8. Semantic category space SCT posits four SC: A, T, I and E. Each is a category of a particular conceptual type: . A and T involve concepts that refer to concepts . I and E involve concepts that refer to observed aspects of our environment. The four empirically derived SC can be arranged conceptually in the form of an abstract quadrant termed the “semantic category space”. One of the 24 possible arrangements of SC in semantic category space is shown below. In this particular arrangement, SC constrained by definition, or current observation, appear at the top and concept/concept categories appear on the left. A

E

T

I

“Profiles” can be traced on this semantic category space so as to encompass, to a greater or lesser extent, each of the four semantic category quadrants. The definition of any idea, theory or school of thought will involve, in various proportions, one or more SC. Each such definition can therefore be characterised, and graphically illustrated, by a distinguishing “profile in semantic category space”. 9. Philosophical schools and scientific theories A semantic category space profile whose domain emphasises quadrants A, E, and I, forms a graphical representation of logical positivism (Ayer, 1946). Logical positivists regarded metaphysical statements as insignificant and “meaningless”. A profile whose domain emphasise quadrants A and T graphically represents the “Platonic Realm”:

Plato’s “Theory of Forms” (PDP, 2000). A profile whose domain emphasises quadrants E and I graphically represents “operationalism” (Bridgman, 1960). This exercise can be repeated for each philosophical school of thought, and indeed for the developing status of scientific theories. The Theory of Phlogiston, published in 1723 by Georg Stahl, in his Fundamentals of Chemistry, to account for combustion “became so entrenched in chemical thinking that a great intellectual effort had to made to break away from it” (Ronan, 1983). Stahl explained combustion as the loss of the substance phlogiston to the atmosphere. The assumed “fire element” phlogiston was never detected. Antoine-Laurent Lavoisier subsequently demonstrated that the products of combustion actually weigh more than the material burnt. In 1789 Lavoisier published his Elementary Treatise on Chemistry and laid the foundations of modern chemistry. “Phlogiston”, once unquestionably assumed to refer to a physical entity, is now recognised as a figurative, purely imaginative, concept. In terms of semantic category space, Phlogiston theory never actually progressed from the “T quadrant” to the “I quadrant” even though for decades phlogiston was assumed to refer to a “real” empirical entity. Theories can be characterised by their position in semantic category space and their development characterised by their movement in that space. A theory excluded from the possibility of semantic category “I” membership is unable to meet the minimum observational criterion required for a scientific theory. SCT itself is a testable SC(I) theory based on observation; not an axiomatic theory based upon assumption or opinion. As with any empirical scientific theory, SCT can be refuted by means of conflicting data. A single observation, a sentence-variant, with which the theoretical constructs of SCT are unable to cope, is sufficient to require either a fundamental revision of this theory, or its dismissal. 10. Conclusion SCT is based on the author’s observation that there are four distinct domains of human discourse: each with its own independent form of “Truth”. SCT provides a means of analysing the written or spoken manifestations of cognition in terms of the semantic components involved. SCT is applicable to every statement ever made in any subject. The range of future SCT-based research possibilities is therefore somewhat extensive. SCA provides a rapid means of highlighting fundamental errors of reasoning; errors which, when camouflaged with layers of obfuscation, are almost undetectable: errors upon which further errors are built. In identifying and resolving semantic intertwine, SCA enables paradoxical anomalies of cognition to be resolved and many “bewitching” notions to be considered objectively. Research aimed at creating machine intelligence continues. We are unlikely to deem devices intelligent until they comprehend human language. Implementing that facility is unlikely to occur until we ourselves have a thorough understanding of the constructional elements of meaning and their interrelationships. SCT, and its associated analytical technique SCA, may well assist the further development of human language understanding: both for machines and ourselves. Evaluating contrasting views often reaches a polarised impasse. A unifying implication of SCT is that disputes between schools of thought arise predominantly

Semantic category theory cognition 1347

K 34,9/10

1348

from the particular semantic category exclusions, emphases and intertwines they each make. Each extreme of a polarised argument often holds its own form of truth: each such school of Philosophical and Scientific thought is often “True” – in its own way. However, with the advent of SCT, it has now become far more difficult to misrepresent one form of truth as another. It is anticipated that SCT will result in a significant re-evaluation of many long-held assumptions: assumptions about our own nature and the nature of the environment that includes ourselves. Note 1. The condition “with respect to the sentence-variant under analysis” is applicable throughout. References Antonopoulous, C. (1997), “Complementary conceptual schemes”, Idealistic Studies, Vol. 27 Nos 1/2, pp. 23-45. Austin, J.L. (1962), How to do Things with Words, Oxford University Press, Oxford. Ayer, A.J. (1946), Language Truth, and Logic, Dover Press, New York, NY. Beall, J.C. and van Fraassen, B.C. (2003), Possibilities and Paradox, Oxford University Press, Oxford. Blackburn, S. (1996), “Paradox”, The Oxford Dictionary of Philosophy, p. 276. Bridgman, P.W. (1960), The Logic of Modern Physics, Macmillan, New York, NY. Chomsky, N. (1995), The Minimalist Program, MIT Press, Cambridge, MA. Edmonds, J.E. (1976), A Transformational Approach to English Syntax, Academic Press, New York, NY. Empson, W. (1930), Seven Types of Ambiguity, Chatto & Windus, London. Gostelow, M. (1976), The Fan, Gill and MacMillan, Dublin, p. 52. Gribbin, J. (1998), Q is for Quantum, Weidenfeld and Nicolson, London. Jackendoff, R.S. (1972), Semantic Interpretation in Generative Grammar, MIT Press, Cambridge, MA. Łukasiewicz, J. (1970), Selected Works, North-Holland, New York, NY. OED (1971), Oxford English Dictionary, Oxford University Press, Oxford. PDP (2000), Penguin Dictionary of Philosophy, Penguin Books, Harmondsworth. Ronan, C. (1983), The Cambridge Illustrated History of the World’s Science, Cambridge University Press, Cambridge. Russell, B. (1903), Principles of Mathematics, Cambridge University Press, Cambridge. St Quinton, J.G. (1982), “Zetetics”, PhD thesis, Department of Cybernetics, University of Reading, Reading. St Quinton, J.G. (2003), “Understanding language: an analytical method for man and machine”, Proceedings: 2nd IEEE Systems, Cybernetic Intelligence, Challenges and Advances, University of Reading, Reading, 17 September 2003. Wittgenstein, L. (1953), Philosophical Investigations, Blackwell, London, §§109. Zee, A. (2001), Einstein’s Universe, Oxford University Press, Oxford, available at: www. thinkingtypes.co.uk

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Bipolar logic and probabilistic interpretation

Bipolar logic and probabilistic interpretation

Mourad Oussalah EECE, University of Birmingham, Edgbaston, Birmingham, UK

1349

Abstract Purpose – The paper aims to answer the question “can the bipolar Negative-Neutral-Positive logic be extended and motivated in some probabilistic framework?” Design/methodology/approach – Using the context of cognitive map interpretation of the conjunction and disjunction connectives in bipolar logic, three probabilistic causal reasoning have been put forward. The first one is based on the infinitesimal representation of material implication while the second one relies on the qualitative representation developed by Suppes and Cartwright. In both cases special conditions for transitivity of inference and multiple inputs scenarios are examined. The third developed approach implicitly omits the cognitive interpretation and rather relies on the idea that the causal independence structure can be substituted by some functional that combines independent inputs in such a way to force the output to be in full agreement with results expected through the conjunctive and disjunctive connectives. Findings – The paper reports several theoretical findings regarding the different conditions ensuring the agreement and equivalence between the bipolar logic connectives and their probabilistic counterparts for each proposal. The paper also provides useful insights to link the finding to probabilistic argumentation system where pro and con arguments are considered simultaneously. Originality/value – The paper offers theoretical basis for researchers investigating different categories of logics and contribute to the discussion linking logic to probability. Keywords Cybernetics, Logic, Probability theory Paper type Research paper

1. Introduction In human reasoning, a wide variety of judgments is based on non-classical logic. Particularly, negative as well as positive judgments are used simultaneously by the user for making decision. Indeed, in many cases, human reasoning goes beyond the classical binary logic where each proposition is assigned exclusively “true” or “false” predicates. For instance, in political science or business, any kind of agreements between two possibly partners are concluded by taking into account both positive and negative rationality parts in each argument. This is referred to as a bipolar reasoning, which leads to a bipolar logic investigated by Zhang (2000) under the name Negative-Positive-Neutral (NPN) logic. In this framework, each proposition is assigned a pair (a, b) reflecting the belief and the disbelief about the truth of the underlying proposition. Motivations of such representation go back to the possibility of relying causality of events in the same framework as the confidence or truth attached to these events when dealing with decision-making process. Indeed, people do evaluate complex policy This paper is an extended and expanded version of the paper (“Bipolar logic for human-computer interaction”) presented by the author at IEEE SMC – UK & Ireland Chapter – meeting in Reading, September 2003. The author is grateful to Nuffield Foundation for providing support for this work.

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1349-1383 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614713

K 34,9/10

1350

alternatives in terms of the consequences as a particular choice would cause, and ultimately of what the sum of all these effects would be. Indeed, as stated by Simon (1969) an important characteristic of complex system is that – the whole is more than the sum of the parts, not in an ultimate, metaphysical sense, but in the important pragmatic sense that given the properties of the parts and the law of their interactions, it is not a trivial matter to infer the properties of whole. Usually, a rational representation of such environment is performed using cognitive map, which uses points to refer to the different concepts, and arrows for all causal links. It is designed to capture the structure of the causal assertions of a person with respect to a particular policy domain, and generate the consequences that follow from this structure (Zhang et al., 1989). In this respect, the pair (a, b), assigned to the relationship between two variables (causal variable and effect variable), reflects the causal assertion of how one concept variable affects another. Positiveness means an augmenting relationship in the sense that an increase (decrease) in a causal variable will result an increase (decrease) in effect variable. While negativity translates an inhebiting relationship; that is, increase (decrease) of cause variable would result in decrease (increase) of effect variable. On the other hand, the presence of positive and negative conveys information concerning the feedback of the effect variable on the causal variable, and subsumes the presence of some conflict. Regardless the interpretation ascribed to such logic, NPN logic allows the user to represent all interactions between the different agents where bipolar perceptions occur naturally. While, it is misleading to represent both aspects into a single unipolar representation. Consequently, conflict, conflict-resolution, argumentation (Dung, 1995) and multiagent co-ordination play a key role in such formalism. The latter sounds particularly useful when dealing with human-computer interactions. In particular, in situations where both the human and computer are competing to achieve a specific task, then the notion of conflict and cooperation arise naturally. Intuitively bipolar logic offers a nice setting to take into account both human belief and disbelief as he/she possesses arguments both in favour and against a given situation. While on the other hand, the machine only tackles positive arguments in the sense that it presents no regret when accomplishing a given task. Consequently, human attitude in maximizing his expected utility and machine behaviour in minimizing error finding can be contrasted together. On the other hand, the use of both human and machine in accomplishing the same task offers another framework for the use of dialogue logic where pros and cons arguments maybe contrasted. In this paper we will be mainly interested in building a bridge between the NPN logic and probabilistic setting. Especially the notion of causal independence and probabilistic inference will be re-examined in the light of the cognitive-based interpretation of NPN logic and sufficient conditions that lead to agreement between the two concepts will be driven. Crucial in this development is to provide a probabilistic interpretation of the rule-based system and then ensure the agreement with the cognitive interpretation, which conveys some causal dependency structure to be determined. From this perspective two probabilistic rule based interpretation are examined. The first one is based on the infinitesimal interpretation mentioned by Pearl (1988), which implicitly assumes that the conditional probability of the effect given the cause is sufficiently high or close to one. While the second one is based on the qualitative representation investigated by Suppes (1970) and Cartwright (1979), which

extends the positive dependence assumption. A third possibility omits the cognitive-map interpretation and relies on the assumption that initial are rather independent but combined through some functional that fully recovers the behaviour of the conjunctive and disjunctive connectives in NPN logic. Section 2 of this paper describes the motivation and main result of NPN logic connectives. Section 3 investigates the cognitive-map interpretation. Section 4 examines the probabilistic interpretation where both infinitesimal and qualitative representations are investigated and sufficient conditions that ensure agreement with cognitive-map interpretation are pointed out. Section 5 emphasizes the functional that fully maps the conjunctive and disjunctive connectives. 2. Bipolar logic and motivation Historically, such studies have been investigated in social and economical studies. A hot topic in distributive artificial intelligence is how to establish coherence and co-ordination among a set of autonomous agents. Coherence refers here how the different agents behave as a unit without conflicting information in order to perform a given task, while co-ordination corresponds to the way by which different subtasks are assigned to the different units, which are thereby guided to the final goal. Pointing out the limits of the classical binary logic to deal with such purposes, Axelrod (1976) has first proposed an eight-valued model as an extension of binary {0, 1} model. It assumes that entities are represented by pairs (a, b), a # b where the universe of description S is given by S ¼ { 2 1; 0; 1; ð21; 0Þ; ð0; 1Þ; ð21; 1Þ; ð21; 0; 1Þ; {}}: To overcome the computational aspect of Axelrod model, De Kleer and Brown (1984) have proposed a four-valued model where the universe of descriptions is S ¼ {2; 0; þ; ?} where the undefined “?” encompasses all the cases which involve coexistence of negative and positive parts, negative-neutral, or positive-neutral parts. The pair (a, b) can either be interpreted as “a or b” or “a and b”. Bipolar reasoning assumes the co-existence of positive and negative sides. In this case, Zhang (2000) advocated that the second interpretation “a and b” fits bipolar representation. Indeed, assuming the zero value may be either positive or negative pole, the singletons {2 1,0,1} correspond rather to the pairs {(21,0), (0,0), (0,1)}. Consequently, the eight-valued model of Axelrod comes down to the following four-valued model S 1 ¼ {ð21; 0Þ; ð0; 0Þ; ð0; 1Þ; ð21; 1Þ}: From S1, it is easy to see that in bipolar based logic, all the pairs (a, b) are such that ða; bÞ [ { 2 1; 0} £ {0; 1}: In this framework, the uplet ðS; ¼; %; ^Þ defines a strict crisp[1] bipolar algebra. A rational for the logical conjunction and disjunction operations are provided in (Zhang, 2000); as ða; bÞ%ðc; d Þ ¼ ðminða; cÞ; maxðb; d ÞÞ:

ð1Þ

ða; bÞ^ðc; d Þ ¼ ðminðad; bcÞ; maxðac; bd ÞÞ:

ð2Þ

Bipolar logic and probabilistic interpretation 1351

K 34,9/10

Now in order to provide more axiomatic justifications for Zhang’s proposal, we shall propose in the next section a set of rational assumptions. We will consider first the cognitive based interpretation where the notions of causal variable and effect variable are of central importance.

1352

3. Cognitive map based-interpretation A cognitive map is a representation of relationships that are perceived to exist among attributes and/or concepts of a given environment. These relationship might be numerically characterized or linguistically (Axelrod, 1976; Zhang et al., 1989). In the former case, one may assume that they take either positive or negative values. Given two elements in a cognitive map, the concern is whether the state of one is perceived to have an influence in the state of the other. From this perspective, let us consider a set of assumptions, which provide a theoretical basis for motivating bipolar logic: (1)

The universe is constituted by positive, neutral, negative and negative-positive (or equivalently negative-neutral-positive) elements.

(2)

Positiveness means that an increase (decrease) in cause variable will induce an increase (decrease) in effect variable, while negativeness refers to the case where both variables behave in opposite directions.

(3)

The operation ^ results from the fact that the effect variable for a given relationship is itself a cause variable for another relationship.

(4)

The operation % accounts for several causes affecting the same effect variable in the sense that the sum of positive effects is positive and the sum of negative effects is negative.

Assumption 1 arises straightforwardly from the definition of NPN logic and the universe S1, which contains negative, null, positive and negative-positive elements represented in bipolar pairs (a, b). Assumption 2 coincides with Axelrod’s postulates of structure of cognitive map where the sign describes rather the relationship between the cause and the effect instead of being the property of one independent entity (effect variable). Assumption 3 and 4 also agree with the graphical interpretation of the cognitive map where the operations ^ and % rather correspond, in some ways, to a serial and a parallel architecture. In other words, ^ corresponds to a graph composition in the sense that the effect variable in a given relationship is the case variable for another relationship. While in the operation %, we postulate that the total effect from A to C is the sum of all indirect effects pertaining to all paths from A to C. It is then natural that if all such indirect effects are positive (negative), the total effect is positive (negative). However, if some of the paths have positive effect and other negative, then the total effect may be positive, neutral or negative. Figure 1 illustrates this aspect. A valuation (0,1) for instance, materializes the rule “If A increases then B increases as well”, provided the cause and effect variables are A and B, respectively. Equivalently, it also induces the rule “if A decreases then B decreases as well”.

Similarly, the valuation (2 1, 0) corresponds to the rule “if A increases then B decreases” or “if A decreases then B increases”. While the valuation (0,0) can be interpreted as “whatever A, B remains unchanged”. Finally, (2 1, 1) corresponds to rule “if A increases, then B may either increase or decrease” or “if A decreases, then B may increase or decrease”. From this perspective, and in view of representation pointed out in Figure 1, ð0; 1Þ^ð0; 1Þ corresponds to rules “if A increases than B increases” and “if B increases then C increases” so by transitivity, one deduces “if A increases then C increases as well”, which corresponds to the valuation (0,1), i.e. ð0; 1Þ^ð0; 1Þ ¼ ð0; 1Þ: Similarly, ð21; 0Þ^ð21; 0Þ corresponds to rules “if A increases then B decreases” and “if B decreases then C increases”, and by transitivity, “if A increases then C increases as well”; that is, ð21; 0Þ^ð21; 0Þ ¼ ð0; 1Þ: The operation ð0; 1Þ^ð0; 0Þ corresponds to the rulebase “if A increases then B increases” and “if B increases, then C remains unchanged”. So, “if A increases then C remains unchanged”, which corresponds to valuation (0,0), i.e. ð0; 1Þ^ð0; 0Þ ¼ ð0; 0Þ: The operation ð21; 1Þ^ð21; 0Þ entails rules “if A increases then B may either decrease or increase” and “if B increases then C decreases”. Consequently, “If A increases then C may either increase or decrease”, which corresponds to valuation (21,1). On the other hand, in view of Assumption 4, the operation % corresponds to a conjunction of cause variables and one effect variable. For instance, ð0; 1Þ%ð0; 1Þ can be interpreted as “if A1 increases then B increases” and “if A2 increases then B increases”. So, “if both A1 and A2 increases then B increases as well”. Similarly ð21; 0Þ%ð0; 0Þ entails rules “if A1 increases then B decreases” and “if A2 increases then B remains unchanged”. Consequently, “if both A1 and A2 increase then B decreases”, which corresponds to valuation (21, 0). While ð21; 0Þ%ð0; 1Þ induces rules “if A1 increases then B decreases” and “if A2 increases then B increases”. Therefore, “if both A1 and A2 increase then B may either increase or decrease”, which entails valuation (2 1, 1), i.e. ð21; 0Þ%ð1; 0Þ ¼ ð0; 0Þ: Notice that the negative-positive element, mentioned in Assumption 1, permits the structure of operation % to deal with the dilemma of addition of positive and negative values where the output depends rather on the absolute values of the operands. In other words, the sign of the output can be known by the only knowledge of the sign of the operands. So, the introduction of negative-positive or, equivalently, negative-neutral-positive element enables the disjunction structure to put the outcome into the class of that element. While in Kleer and Brown model, such case entails the unknown “?”. Consequently, according to this viewpoint, bipolar NPN agrees more likely with Axerold’s view point.

Bipolar logic and probabilistic interpretation 1353

Figure 1. Graph interpretation of ^ and %

K 34,9/10

1354

Now one may establish the following statements. Proposition 1. When the operands are either positive, negative or neutral the operation ^ behaves similarly to the multiplication of signed numbers. Proof. Assume that in the first relationship (path) A is a cause variable and B the effect variable, while in the second relationship B is a cause variable and C the effect variable. Now according to the sign of both relations, five cases may be distinguishable (1) Both relationships have positive effects. In this case, according to Assumptions 2 and 3, an increase in A leads to an increase in B which also induces an increase in C. In other words, increase in A induces increase in B. This can be rewritten more formally, ð0; 1Þ^ð0; 1Þ ¼ ð0; 1Þ (2) The first relationship induces positive effect and the second a negative effect. Similarly, an increase in A leads an increase in B which induces a decrease in C. That is, increase in A induces decrease in C, i.e. ð0; 1Þ^ð21; 0Þ ¼ ð21; 0Þ (3) Interchanging A and B in (ii) leads to ð21; 0Þ^ð0; 1Þ ¼ ð21; 0Þ: (4) If both relations have negative effects, then increase in A induces decrease in B which entails increase in C. More formally, ð21; 0Þ^ð21; 0Þ ¼ ð0; 1Þ: (5) Assume that one of the relationships, say the second one, induces null effect. That is, increase in A leads to an increase (decrease) in B which entails a null effect in C. So A entails a zero effect in C, i.e. ð0; 1Þ^ð0; 0Þ ¼ ð0; 0Þ and ð21; 0Þ^ð0; 0Þ ¼ ð0; 0Þ: Similarly, by interchanging relationships 1 and 2 leads to ð0; 0Þ^ð0; 1Þ ¼ ð0; 0Þ and ð0; 0Þ^ð21; 0Þ ¼ ð0; 0Þ: This clearly means that the structure of multiplication of signed numbers and ^ are in complete agreement. A The operation % also agrees with the structure of the addition of signed numbers by virtue of Assumption 4. So, the fact that zero is a neutral element for this operation is trivial. While the case of the coexistence of positive and negative effects is supported by the negative-positive element. A Proposition 2. The operations % and ^ are distributive with respect to the set union operator. Proof. From Assumption 4 and Assumption 1, it follows that ð21; 0Þ%ð0; 1Þ ¼ ð21; 1Þ and ð21; 0Þ%ð21; 1Þ ¼ ð21; 1Þ; ð21; 0Þ%ð0; 1Þ ¼ ð21; 1Þ; ð21; 0Þ%ð0; 0Þ ¼ ð21; 1Þ: The term (21, 1) in the right hand side of the conjunction can be decomposed as ð21; 0Þ < ð0; 1Þ: So, it holds ð21; 0Þ%ð21; 1Þ ¼ ð21; 0Þ%½ð21; 0Þ < ð0; 1Þ ¼ ½ð21; 0Þ%ð21; 0Þ < ½ð21; 0Þ%ð0; 1Þ

¼ ½ð21; 0Þ < ð21; 1Þ ¼ ð21; 1Þ:

Similarly, one may prove that ð21; 0Þ%ð21; 1Þ ¼ ð21; 0Þ%½ð21; 0Þ < ð0; 1Þ < ð0; 0Þ

The proof for the operation ^ starts from the observation that ð21; 1Þ^ð0; 1Þ ¼ ð21; 1Þ^ð21; 0Þ ¼ ð21; 1Þ; and ð21; 1Þ^ð0; 0Þ ¼ ð0; 0Þ: This can easily be established from the interpretations used in the proof of Proposition 1. It is then also easy to check, for instance,

Bipolar logic and probabilistic interpretation

ð21; 1Þ^ð21; 0Þ ¼ ½ð21; 0Þ < ð0; 1Þ^ð21; 0Þ ¼ ½ð21; 0Þ^ð21; 0Þ < ½ð0; 1Þ^ð21; 0Þ

¼ ð0; 1Þ < ð21; 0Þ ¼ ð21; 1Þ

1355

The detail of the proof is omitted for similarity of the different cases. A Proposition 3. Operations % and ^ are distributive with respect to the set intersection operator. Proof. One may restrict the proof to % for its similarity to that of ^. Take, for instance, ð21; 1Þ > ð0; 1Þ ¼ ð0; 1Þ; we have ½ð21; 1Þ > ð0; 1Þ%ð21; 0Þ ¼ ð0; 1Þ%ð21; 0Þ ¼ ð21; 1Þ ¼ ½ð21; 1Þ%ð21; 0Þ > ½ð0; 1Þ%ð21; 0Þ

¼ ð21; 1Þ > ð21; 1Þ ¼ ð21; 1Þ

Another special case arises when considering, for instance, ½ð21; 0Þ > ð0; 1Þ%ð0; 1Þ ¼ ð0; 0Þ%ð0; 1Þ ¼ ð0; 1Þ ¼ ½ð21; 0Þ%ð0; 1Þ > ½ð0; 1Þ%ð0; 1Þ

¼ ð21; 1Þ > ð0; 1Þ ¼ ð0; 1Þ:

It is worth noticing that the basic results, using % and ^ within Boolean structure (restricting to positive and zero elements), still are held. According to this viewpoint, NPN can be viewed as an extension of Boolean logic to negative relationships. A 4. Probabilistic interpretation Clearly, assuming the graphical interpretation of the operations ^ and %, let A, B, B1, B2 and C be random variable. Then in the spirit of Figure 1, it holds that ^ and % corresponds to graphical representation shown in Figure 2(a) and (b) respectively. Strictly, speaking several interpretations can be ascribed to such graphical representation. The most common one, widely used in belief network literature (Pearl, 1988), consists of arguing in Figure 2(a), for instance, that B is a cause of C and A is a cause of B. This can be rewritten in terms of production rules as “If B then C” and “If A then B”. From this perspective, the probability value P(BjA) and P(CjB) constitute our confidences on the above entailments, or reliability values attached to the above rules.

Figure 2. Example of graphical interpretation

K 34,9/10

1356

So, an interesting problem is to find out the confidence of the inferred rule “If A then C” from the composite rules “If A then B” and “If B then C”. The above reasoning represents the core of probabilistic belief network theory as initiated by Pearl (1988). Clearly crucial in such reasoning is the notion of independence and causal independence; that is, in case of Figure 2(a), at what extent the events C and A are independent given event B. Using conditional independence, we have PðCjA; BÞ ¼ PðCjBÞ

ð3Þ

PðCjA; B1 ; B2 Þ ¼ PðCjB1 ; B2 Þ

ð4Þ

And

(3) can be rewritten, using Bayes’ chain formulae as PðC; B; AÞ ¼ PðCjBÞ · PðBjAÞ · PðAÞ;

ð5Þ

Similarly, one obtains for (4) PðC; B1 ; B2 ; AÞ ¼ PðCjB1 ; B2 ÞPðB1 jAÞPðAÞ:

ð6Þ

Now in order to put forward a framework that allows us to represent the different valuations, we shall use the following events: A þ for “increase in A”, A 2 for “decrease in A”, where A stands for either cause or effect variable. Strictly speaking, linking rules like “if E then F” to probabilistic constraints has given arise to emerging literature of probabilistic logic starting from Adam’s pioneer work (Adam, 1975). Especially coherence with standard logical deduction inferences may be violated when translated into probabilistic setting (Spirtes et al., 2000; Skyrms and Harper, 1988). We shall report here two interesting interpretations of the above rule. The former can be understood as description of behaviour of non-standard probabilistic relationship (Pearl, 1988; Adam, 1975): “if E then F” entails PðFjEÞ $ 1 2 1

ð7Þ

where 1 is infinitesimally greater than zero. This also corresponds to the prescription that “P (FjE) is high” or close to one. Conversely, if the rule “if E then F” does not hold, this can be expressed as PðEjFÞ # 1 A second interpretation in the light of Suppes (1970) and Cartwright (1979) work suggests “if E then F” entails PðFjEÞ . PðFÞ

ð8Þ

We shall refer to representation (7) as infinitesimal representation and (8) as qualitative representation in the sense that it exhibits a qualitative relationship, which is a strict inequality, among probability values. Now let us look to the influence of both representations on the preceding causality description shown in Figures 1 and 2.

4.1 Use of infinitesimal representation In this course, given for instance that a valuation (a, b) materializes a relationship between a cause variable A and effect variable B and given the previous notations of events, then following representation (7), (1,0) can be translated into PðB þ jA þ Þ $ 1 2 1 and PðB 2 jA 2 Þ $ 1 2 1 and PðB þ jA 2 Þ # 1 and PðB 2 jA þ Þ # 1: To sum up, ( PðB þ jA þ Þ $ 1 2 1 and PðB 2 jA 2 Þ $ 1 2 1 ða; bÞ ¼ ð1; 0Þ , ð9Þ PðB þ jA 2 Þ # 1 and PðB 2 jA þ Þ # 1 Similarly, ( ða; bÞ ¼ ð21; 0Þ ,

PðA þ jB 2 Þ $ 1 2 1 and PðA 2 jB þ Þ $ 1 2 1 PðA þ jB þ Þ # 1 and PðA 2 jB 2 Þ # 1

ð10Þ

ða; bÞ ¼ ð21; 1Þ , PðB þ jA þ Þ ¼ PðB 2 jA þ Þ $ 1=2 2 1 and PðB þ jA 2 Þ ¼ PðB 2 jA 2 Þ $ 1=2 2 1

ð11Þ

Equation (11) expresses the fact that in the case where the evaluation (2 1, 1) is concerned, given an increase (decrease) in cause variable is likely to produce either increase or decrease in effect variable, and these two outputs are equally probable. So, PðB þ jA þ Þ þPðB 2 jA þ Þ $ 1 2 1 (high probability) and PðB þ jA 2 Þ þ PðB 2 jA 2 Þ $ 1 2 1: Regarding the null element (0, 0), a plausible probabilistic interpretation of the null effect, which materializes the fact that an increase or a decrease in cause variable has no effect in the effect variable, consists of arguing that the event in the effect variable is (statistically) independent of that in cause variable; namely, ða; bÞ ¼ ð0; 0Þ , PðB i jA j Þ ¼ PðB i Þ;

ð12Þ

where i, j stand for either þ or 2 . Equation (12) is nothing else than the standard probabilistic independence; that is, B i is stochastically independent of the event A j. Clearly, this assertion is the most plausible probabilistic feature that models the null effect of effect variable on cause variable. Indeed, intuitively, if there is no effect between the two variables then they are (deterministically) independent, which, in turn, also induces a statistical independence. So, a relevant problem is to look whether one can infer from rules “if A i then B j” and “if B j then C k”, the rule “if A i then C k”, where i; j; k ¼ þ; 2 : This is referred to as transitivity rule. As shown in Eells (1991); Pearl (1988); Adam (1975) and Wellman (1988), the response to this question is negative in general. It is therefore appealing to find out appealing window where the transitivity rule applies. Recall that if events X and Y are stochastically independent then it also holds PðX; Y Þ ¼ PðX > Y Þ ¼ PðXÞ · PðY Þ: On the other hand, if PðX; Y Þ $ PðXÞ · PðY Þ holds X and Y are said to be positive dependent, otherwise X and Y are said to be negative dependent.

Bipolar logic and probabilistic interpretation 1357

K 34,9/10

1358

Strictly speaking the idea behind positive dependence of events X and Y is the property that large (respectively small) values of X (or functions of X) go together with large (respectively small) values of Y (or functions of Y) as compared to stochastic independent with the same corresponding marginal distributions. Proposition 4. Given that A is a cause variable of an effect variable B, which is itself a cause variable of the effect variable C. Then assuming that events pertaining to A and C be positive dependent given B’s event as well as those pertaining to A and B given C’s event, then using the infinitesimal representation and provided the conditional probabilities do not vanish, the transitivity rule holds. Proof. In infinitesimal representation, the rule “if A i then B j ” , PðB j jA i Þ $ 1 2 1 and PðA i jB j Þ # 1

ð13Þ

“if B j then C kj ” , PðC k jB j Þ $ 1 2 1 and PðB j jC k Þ # 1

ð14Þ

Similarly,

Using the monotonicity of probability measures with respect to set inclusion operation and Bayes’ rule, it holds that PðC k jB j ; A i Þ ¼

PðC k > B j > A i Þ PðC k > A i Þ PðC k jA i Þ # ¼ PðB j > A i Þ PðB j > A i Þ PðB j jA i Þ

ð15Þ

Using inequality in (13), (15) is equivalent to PðC k jB j ; A i Þ #

PðC k jA i Þ : 121

Therefore, PðC k jA i Þ $ ð1 2 1ÞPðC k jB j ; A i Þ

ð16Þ

On the other hand, using Bayes’ rule, it also holds PðC k jB j ; A i Þ ¼

PðC k > B j > A i Þ PðC k > A i jB j Þ ¼ PðB j > A i Þ PðA i jB j Þ

ð17Þ

Using the positive dependence assumption, we have PðC k > A i jB j Þ $ PðC k jB j Þ · PðA i jB j Þ Substituting the latter in (17-16) and using (13-14) inequalities, it holds PðC k jA i Þ $ ð1 2 1Þð1 2 1Þ · PðA i jB j Þ=PðA i jB j Þ i.e. PðC k jA i Þ $ ð1 2 1Þ2

ð18Þ

PðC k jA i Þ $ 1 2 Oð1Þ;

ð19Þ

i.e.

provided that PðA i jB j Þ is a non-zero element. This proves the first part of the transitivity rule, i.e. PðC k jA i Þ $ 1 2 1: To show that PðA i jC k Þ # 1; using Bayes’ and monotonicity of probability measures and since all conditional probabilities do not vanish, we have PðA i jC k Þ ¼

PðA i > C k Þ PðC k Þ 1 # ¼ k i j k i PðC Þ PðA > B > C Þ PðA > B j jC k Þ i

ð20Þ

j

Using positive dependence of events A and B ; (20) entails PðA i jC k Þ #

1 PðA i jC k Þ · PðB j jC k Þ

ð21Þ

i.e. ½PðA i jC k Þ2 #

PðA i jB j ÞPðB j Þ PðA i jB j ÞPðC k Þ PðA i jB j Þ ¼ # PðB j jA i Þ · PðB j jC k Þ PðB j jA i Þ · PðC k jB j Þ PðB j jA i Þ · PðC k jB j Þ

Using (13-14), the latter inequality becomes equivalent to ½PðA i jC k Þ2 #

1 ð1 2 1Þ2

Consequently, PðA i jC k Þ # Oð1Þ: This completes the proof of the transitivity rule. The preceding shows that one can also prove the negative transitivity rule, which holds in the absence of relationship; namely, given that A is cause variable of the effect variable B, which is itself a cause variable of effect variable C. Can we infer from the statement “A is not effect variable of B” and “B is not effect variable of C”, the statement “A is not a cause variable of C ”? As pointed in the core of Proposition 4, the notion of positive dependence plays a central role to make the transitivity statement holds. Notice that if the negative transitivity is not considered then the requirement of positive dependence of events pertaining to A and C is enough to validate the transitivity statement. The requirement of positive dependence of events pertaining to A and B is mainly needed to ensure the negative transitivity statement. On the other hand, as it can be seen from the proof of the above proposition, the negative transitivity is mainly constrained by the existence of relationship. In other words, without the constraints “A is a cause variable of event B” and “B is a cause of C”, one cannot infer the statement “C is not a cause of A” from statements “B is not a cause of A” and “C is not a cause of B”. Strictly speaking, this is quite in complete agreement with the intuition. Indeed, saying that “B is not a cause of A” and “C is not a cause of B”, nothing can be said regarding the relationship between A and C since nothing prevents the existence of a path from A to C. The issue of non-zero conditional probabilities may sound at first glance problematic, however, one should stress that some of these probabilities are prohibited from zero element at the start. Indeed the condition of PðB j jA i Þ $ 1 2 1 and PðC k jB j Þ $ 1 2 1 for an infinitesimal small 1 . 0; makes the two above probabilities very high and so cannot vanish. However, if PðA i jB j Þ ¼ 0 for instance holds, this makes the entity PðC k jB j ; A i Þ non-defined and so is the entity PðC k jA i Þ: On the other

Bipolar logic and probabilistic interpretation 1359

K 34,9/10

1360

hand PðA i jB j Þ ¼ 0 contradicts with the fact that PðB j jA i Þ being a definite entity (since the latter expression subsumes PðA i Þ . 0, while the former entails PðA i Þ ¼ 0). Therefore the condition of nonzero conditional probabilities is rather a pre-requisite of coherence of initial statements. Also, notice that an equivalent condition of non-zero conditional probabilities is non-zero probability of single events, i.e. PðA i Þ, PðB j Þ and PðC k Þ are strictly positive. Now the forthcoming result links the result pointed out by the preceding and the properties of the laws of conjunction and disjunction. A Proposition 5. Given the assumptions of Proposition 4 and the cognitive map interpretation of connective ^, the latter fully agrees with the infinitesimal representation of the material implication. Proof. Assume that A is a cause variable of B, which is itself a cause variable of C. (1) To prove that ð0; 1Þ^ð0; 1Þ ¼ ð0; 1Þ; one needs to prove that from rules “if A increases then B increases” and “if B increases then C increases”, one infers “if A increases then C increases”. Also, from “if A decreases then B decreases” and “if B decreases then C decreases”, one infers “if A decreases then C decreases”. More formally to prove the first statement, it suffices to show that PðB þ jA þ Þ $ 1 2 1 and PðA þ jB þ Þ # 1 together with PðC þ jB þ Þ $ 1 2 1 and PðB þ jC þ Þ # 1 entails PðC þ jA þ Þ $ 1 2 1 and PðA þ jC þ Þ # 1: Clearly this is straightforward from the transitivity rule proven in Proposition 4. For the second statement, one needs to show that PðB 2 jA 2 Þ $ 1 2 1 and PðA 2 jB 2 Þ # 1 together with PðC 2 jB 2 Þ $ 1 2 1 and PðB 2 jC 2 Þ # 1 entails PðC 2 jA 2 Þ $ 1 2 1 and PðA 2 jC 2 Þ # 1: Again this is also straightforward from transitivity rule. (2) Similarly, the statement ð0; 1Þ^ð21; 0Þ ¼ ð21; 0Þ results straightforwardly from the transitivity rule. The detail is omitted. (3) To prove the statements ð0; 1Þ^ð0; 0Þ ¼ ð0; 0Þ and ð21; 0Þ^ð0; 0Þ ¼ ð0; 0Þ; notice that, for instance for the first statement that given that “A is a cause variable of B” and “B has no effect on C”, one deduces that B and C are physically independent, so, as there is no direct link from A to C, thereby, A and C are also physically independent, which, in turn, entails statistical independence, i.e. PðCjAÞ ¼ PðCÞ; which is equivalent to the valuation (0,0). Same reasoning applies for the second statement, as well as statements ð0; 0Þ^ð0; 0Þ ¼ ð0; 0Þ and ð21; 1Þ^ð0; 0Þ ¼ ð0; 0Þ: (4) To prove the statement ð0; 1Þ^ð21; 1Þ ¼ ð21; 1Þ: The cognitive interpretation of the first part of the equality is as follows “if A increases then B increases” and “if B increases then either C increases or decreases” and/or “if A decreases then B decreases” and “if B decreases then either C increases or decreases” Using the first interpretation, this translates into PðB þ jA þ Þ $ 1 2 1

and

PðC þ jB þ Þ $ 1=2 2 1; PðC 2 jB þ Þ $ 1=2 2 1

Then using the same reasoning as that carried out in Proposition 4, from PðB þ jA þ Þ $ 1 2 1 and PðC þ jB þ Þ $ 1=2 2 1; the counterpart of equation (18) would be PðC þ jA þ Þ $ ð1 2 1Þð1=2 2 1Þ

i.e. PðC þ jA þ Þ $ 1=2 2 Oð1Þ

ð22Þ

Bipolar logic and probabilistic interpretation

Similarly, from PðB þ jA þ Þ $ 1 2 1 and PðC 2 jB þ Þ $ 1=2 2 1; the counterpart of equation (18) would be

1361

PðC 2 jA þ Þ $ ð1 2 1Þð1=2 2 1Þ; i.e. PðC 2 jA þ Þ $ 1=2 2 Oð1Þ:

ð23Þ

Equations (22) and (23) validate the rule “if A increases then either C increases or decreases”, which, in turn, ensures the valuation (2 1,1). Similar reasoning applies to prove the second statement “if A decreases then B decreases” and “if B decreases then either C increases or decreases”. The proofs of ð21; 0Þ^ð21; 1Þ ¼ ð21; 1Þ and ð21; 1Þ^ð21; 1Þ ¼ ð21; 1Þ can be conducted in similar way. The detail is omitted for its similarity. Clearly the preceding shows that a probabilistic interpretation together with reasonable independence assumption can lead to a comprehensive justification of connective ^ in the sense of expression (2). A Proposition 6. Given that both A1 and A2 are cause variables of the same effect variable B, then in the light of the cognitive map interpretation of connective % and given the statistical independence of events pertaining to A1 and A2 and the positive dependence of events pertaining to A1 and A2 given that of B, then (1) fully agrees with the infinitesimal representation. Proof. (1) To prove ð0; 1Þ%ð0; 1Þ ¼ ð0; 1Þ; consider the rules “if A1 increases then B increases” and “if A2 increases then B increases”. The infinitesimal representation of these rules consists of þ þ PðB þ jAþ 1 Þ $ 1 2 1 and PðB jA2 Þ $ 1 2 1

ð24Þ

Using Bayes’ rule and independence assumptions stated in the core of the proposition, we have þ PðB þ jAþ 1 ; A2 Þ ¼

þ þ þ þ PðB þ > Aþ PðAþ 1 > A2 Þ 1 > A2 jB Þ · PðB Þ ¼ þ þ PðAþ PðAþ 1 > A2 Þ 1 ÞPðA2 Þ

þ PðB þ jAþ 1 ; A2 Þ $

¼

þ þ þ þ PðAþ 1 jB Þ · PðA2 jB Þ · PðB Þ þ PðAþ 1 ÞPðA2 Þ þ þ þ þ þ PðB þ jAþ 1 Þ · PðB jA2 ÞPðA1 ÞPðA2 Þ · PðB Þ þ þ PðA1 ÞPðA2 Þ · PðB þ Þ · PðB þ Þ

K 34,9/10

i.e. þ PðB þ jAþ 1 ; A2 Þ $

þ þ PðB þ jAþ þ þ 1 Þ · PðB jA2 Þ $ PðB þ jAþ 1 Þ · PðB jA2 Þ PðB þ Þ

ð25Þ

Using (24),

1362

þ 2 PðB þ jAþ 1 ; A2 Þ $ ð1 2 1Þ ¼ 1 2 Oð1Þ

This proves the rule “if both A1 and A2 increase then B increases” (2) Similarly, to prove that ð0; 1Þ%ð0; 0Þ ¼ ð0; 1Þ; one considers rules “if A1 increases then B increases” and “if A2 increases then B remains unchanged”, which translate into þ þ þ PðB þ jAþ 1 Þ $ 1 2 1 and PðB jA2 Þ ¼ PðB Þ

Using similar development as in (i), we have þ PðB þ jAþ 1 ; A2 Þ $

þ þ PðB þ jAþ 1 Þ · PðB jA2 Þ ¼ PðB þ jAþ 1Þ þ PðB Þ

Therefore, þ PðB þ jAþ 1 ; A2 Þ $ 1 2 1

This validates the rule “if both A1 and A2 increases then B increases”, or equivalently, the valuation (0,1). The same reasoning can be applied to prove the statements ð21; 0Þ%ð0; 0Þ ¼ ð21; 0Þ andð0; 0Þ%ð0; 0Þ ¼ ð0; 0Þ: (3) To show the statement ð0; 1Þ%ð21; 0Þ ¼ ð21; 1Þ; one considers the rules “if A1 increases, then B increases” and “if A2 increases then B decreases”, which translate into 2 þ PðB þ jAþ 1 Þ $ 1 2 1 and PðB jA2 Þ $ 1 2 1

Equivalently, the rules “if A1 decreases, then B decreases” and “if A2 decreases then B increases” also hold, so þ 2 PðB 2 jA2 1 Þ $ 1 2 1 and PðB jA2 Þ $ 1 2 1

Using the first two inequalities, we have, using the same reasoning as in (i), þ þ þ þ þ PðB þ jAþ 1 ; A2 Þ $ PðB jA1 Þ · PðB jA2 Þ þ þ þ 2 Clearly, given that PðB þ jAþ 1 Þ [ ½1 2 1 1 and PðB jA1 Þ ¼ 12 PðB jA1 Þ [ þ þ þ ½0 1; therefore using (25), it holds that PðB jA1 ; A2 Þ [ ½0 1; i.e. there is no þ lower bound to quantity PðB þ jAþ 1 ; A2 Þ; except the trivial 0. Similarly it also holds that there is no specific lower bound to quantity þ þ þ 2 PðB 2 jAþ 1 ; A2 Þ except the trivial one 0, i.e. PðB jA1 ; A2 Þ [ ½0 1: Consequently, using the principle of insufficient reason, given that the events þ Bþ and B2 are complementary, the probabilities PðB 2 jAþ 1 ; A2 Þ and þ þ þ PðB jA1 ; A2 Þ should be equal, or in the language of infinitesimal

representation, very close to each other. This can be rewritten as þ þ þ þ PðB 2 jAþ 1 ; A2 Þ $ 1=22Oð1Þ and PðB jA1 ; A2 Þ $1=2 2 Oð1Þ: This means that when both A1 and A2 increase, then either B increases or decreases. Similar reasoning can be used to show that the rules “if A1 decreases, then B decreases” and “if A2 decreases then B increases”, would lead to 2 þ 2 2 PðB 2 jA2 1 ; A2 Þ $ 1=2 2 Oð1Þ and PðB jA1 ; A2 Þ $1=2 2 Oð1Þ: (4) To prove ð0; 1Þ%ð21; 1Þ ¼ ð21; 1Þ and ð21; 0Þ%ð21; 1Þ ¼ ð21; 1Þ: For the first statement, considers for instance the rules “if A1 increases, then B increases” and “if A2 increases then B either increases or decreases”, which translate into 1 þ þ PðB þ jAþ 1 Þ $ 1 2 1 and PðB jA2 Þ $ 2 1 2 Therefore, (25) can be rewritten as þ þ PðB þ jAþ þ þ 1 Þ · PðB jA2 Þ $ PðB þ jAþ 1 Þ · PðB jA2 Þ PðB þ Þ   1 $ ð1 2 1Þ 2 1 2

þ PðB þ jAþ 1 ; A2 Þ $

i.e. 1 þ PðB þ jAþ 1 ; A2 Þ $ 2 Oð1Þ 2 By complementarity, it also holds 1 þ PðB 2 jAþ 1 ; A2 Þ $ 2 Oð1Þ 2 Similarly, using equivalent representation of previous rules, i.e. “if A1 decreases, then B decreases” and “if A2 decreases then B either increases or decreases”, it holds 2 2 PðB 2 jA2 1 Þ $ 1 2 1 and PðB jA2 Þ $ 1=2 2 1;

which in the light of (25) yield 2 2 2 2 2 PðB 2 jA2 1 ; A2 Þ $ PðB jA1 Þ · PðB jA2 Þ $ ð1 2 1Þ

  1 1 2 1 ¼ 2 Oð1Þ 2 2

By analogy, we also have 1 2 PðB þ jA2 1 ; A2 Þ $ 2 Oð1Þ 2 Similar reasoning can be applied to prove the second statement ð21; 0Þ%ð21; 0Þ ¼ ð21; 1Þ:

Bipolar logic and probabilistic interpretation 1363

K 34,9/10

(5) To show the statement ð21; 1Þ%ð21; 1Þ ¼ ð21; 1Þ; staring from rules “if A1 decreases, then either B decreases or increases” and “if A2 decreases then B either increases or decreases”, it holds 1 1 2 þ þ þ PðB þ jAþ 1 Þ $ 2 1 and PðB jA1 Þ $ 2 1; PðB jA2 Þ 2 2

1364

1 1 $ 2 1 and PðB 2 jA2 2 Þ $ 2 1; 2 2 Again applying, (25) would lead to 1 1 þ 2 þ þ PðB þ jAþ 1 ; A2 Þ $ 2 Oð1Þ and PðB jA1 ; A2 Þ $ 2 Oð1Þ: 4 4 þ þ þ 2 Reconciling the above expression with PðB þ jAþ 1 ; A2 Þ ¼ 12PðB jA1 ; A2 Þ yield þ 2 þ þ PðB þ jAþ 1 ; A2 Þ $ 1=2 2 Oð1Þ and PðB jA1 ; A2 Þ $ 1=2 2 Oð1Þ 2 2 2 2 Similar reasoning entails PðB þ jA2 1 ; A2 Þ $1=2 2 Oð1Þ and PðB jA1 ; A2 Þ $ 1=2 2 Oð1Þ:

The proof of the statement ð0; 0Þ%ð21; 1Þ ¼ ð21; 1Þ is omitted for its simplicity and similarity to the preceding. A Summing up the results pointed out in Propositions 5 and 6, one can notice the following . The infinitesimal representation of the material implication allows us to provide a full probabilistic description of the representations (1) and (2) underlying connectives % and ^. . The probabilistic representation relies on the cognitive representation exhibited in Figure 1. In the latter, each valuation X (given as a pair (x1, x2)) conveys some qualitative relationship between some effect variable and some cause variable, where (0,1), for instance, means increase in cause variable induces increase in effect variable as well. The connective X^Y corresponds to a composition operation where the effect variable in X is itself a cause variable of Y, so the result of this operation corresponds to the result of such composition relation. Whereas in case of connective %, the outcome relies on the effect of accumulating several cause variables on the same effect variable. . Besides the preceding, the essence in the probabilistic reasoning consists in giving a probabilistic interpretation to the rule base underlying the cognitive interpretation, where, using infinitesimal representation, rule like “if A then B” is interpreted as PðBjAÞ $ 1 2 1; with 1 . 0 and close to zero. . Using this infinitesimal representation, expressions (1) and (2) have found to be in full agreement with the underlying probabilistic interpretations up to some dependency assumptions. The latter consist of positive dependence of events pertaining to cause and effect variables of first entity and between the cause variable of first entity and effect variable of second entity. While in case of

.

.

the connective %, the dependence assumption relies on the positive dependence of events pertaining to the causal variables given the effect variable, and on the statistical independence of unconditional events pertaining to the causal variables. In the case of connective % an unforeseen assumption of the principle of insufficient reason, which stipulates that in the absence of further evidence, probabilities tend towards the uniform probability. Clearly, this is in full agreement with the cognitive viewpoint in the sense that in the absence of evidence from cause variables, the effect variable can either increase or decrease equal likely. The results of Propositions 4 to 6 are deemed valid only if probabilities of single events are nonzero valued. Indeed, from the statement PðBjAÞ $ 1 2 1; it is clear that the condition P(B) is a pre-requisite, in the sense that if PðBÞ ¼ 0 and PðAÞ . 0; then it is obvious that since B > A # B; using the monotonicity of probability measure, it holds that PðB > AÞ # PðBÞ ¼ 0; and thereby, PðBjAÞ ¼ 0; which contradicts the hypothesis P(BjA) being high. While the condition of PðAÞ . 0 is pre-requisite for Bayes’ rule. However, if the pseudo Bayes’ formula advocated by several researchers, see for instance, Suppes (1970), which reads ( PðY jXÞ ¼

.

.

PðY > XÞ=PðXÞ

if

PðXÞ . 0

1

if

PðXÞ ¼ 00

has been employed, then Proposition 4, thereby Proposition 5, would require only P(B j) and P(C k) being non-zero elements. The case of PðA i Þ ¼ 0 would yield PðC k jA i Þ ¼ 1; which is obviously valid and in full agreement with infinitesimal representation. Similarly, in the case of Proposition 6, if either PðAi1 Þ or PðAj2 Þ ¼ 0; then PðB k jAi1 ; Aj2 Þ ¼ 1; which is also in full agreement with infinitesimal representation. Results of Propositions 5 and 6 show that algebraical properties of connectives % and ^ are preserved. That is, both connectives are commutative, have (0,0) and (0,1) as neutral element, respectively. Clearly, some of these properties are not trivial from the cognitive interpretation. The graphical representation, for instance, pertaining to connective ^ in Figure 1, at first glance, suggests that the connective is not symmetric as the composition operation is not. But when coming down to manipulating the different entities over the set of all possible events, this suggestion turns out to be wrong. Results of Propositions 5 and 6 have been developed to the case of only two operands. However, the results can easily be extended to the case of several operands. The essence is to show that the associativity property holds. For this purpose, consider three cause/effect variables A1, A2, A3 and one effect variable B. The connectives % and ^ would have the cognitive representation shown in Figure 3(a) and (b), respectively

Clearly using modularity of the graphical representation, combining the first two components in case of connective ^, corresponds to mapping the three initial variables A1, A2, A3 into a two variables, say A01 and A02 ; corresponding to a cognitive

Bipolar logic and probabilistic interpretation 1365

K 34,9/10

1366

representation of the previous result (two-operands). This recovers another two entity combinations with variables A01 ; A02 and B. Similarly, in case of connective %, combining the first two entities corresponds to turning the graph in Figure 2(b) to a graph with A01 and A3 as cause variable and B as effect variable. Similar results would be obtained if the second and third entities were combined first. On the other hand, from the requirement perspective regarding the events attached to A1, A2, A3 and B, one notices that the independence/dependence assumptions mentioned in Propositions 4-5 should be fulfilled for each pair of entities where the pair might be either a single event like Ai, or a conjunction of events like Ai > Aj : For instance in case of connective %, this comes down to assuming that . events pertaining to A1 and A2 be positive dependent given that of B; . events pertaining to A1 and A3 be positive dependent given that of B; . events pertaining to A3 and A2 be positive dependent given that of B; . events pertaining to A1 > A3 and A2 be positive dependent given that of B; . events pertaining to A1 > A2 and A3 be positive dependent given that of B; . events pertaining to A2 > A3 and A1 be positive dependent given that of B. The remaining pairs like A1 > A2 and A3 > A2, for instance are redundant. On the other hand, the unconditional events pertaining to each pair among A1, A2, A3 as well as any combinations of pairs (in the sense of conjunctive combination) as previously are statistically independent. The preceding ensures the associativity of connective % in probabilistic setting, which allows some modularity in combination of several entities. In the case of connective ^, the essence is to keep the assumptions pointed out in Proposition 4 hold for any three consecutive events in the sense of representation of Figure 2(a). For instance, one needs to have in this case: . events pertaining to A1 and A3 are positive dependent given that of A2; . events pertaining to A1 and A2 are positive dependent given that of A3; . events pertaining to A1 and A3 are positive dependent given that of B; . events pertaining to A1 and B are positive dependent given that of A3; . events pertaining to A2 and A3 are positive dependent given that of B; . events pertaining to A2 and B are positive dependent given that of A2; . events pertaining to A1 and A2 are positive dependent given that of B; . events pertaining to A1 and B are positive dependent given that of A2;

Figure 3.

The first four allows us to use the transitivity rule to obtain the result of the first two entities, and then use the transitivity rule again to combine the outcome with the third entity. While the last four requirements allows us to combine the second and third first and the outcome is combined with the first entity, which thereby allows the associativity of connective ^ in probabilistic setting. The preceding shows that the associativity property needs some further independence/dependence assumptions as pointed out earlier to hold for both connectives. 4.2 Use of qualitative representation Similarly to section 4.1, in the light of representation (8), a rational for the four-pair valuation is the following ( PðB þ jA þ Þ . PðB þ Þ & PðB 2 jA 2 Þ . PðB 2 Þ ða; bÞ ¼ ð1; 0Þ , ð26Þ PðB þ jA 2 Þ # PðB þ Þ & PðB 2 jA þ Þ # PðB 2 Þ Similarly, ( ða; bÞ ¼ ð21; 0Þ ,

PðB þ jA 2 Þ . PðB þ Þ & PðB 2 jA þ Þ . PðB 2 Þ PðB þ jA þ Þ # PðB þ Þ & PðB 2 jA 2 Þ # PðB 2 Þ

ð27Þ

ða; bÞ ¼ ð21; 1Þ , PðB þ jA þ Þ . PðB þ Þ & PðB 2 jA þ Þ . PðB 2 Þ & PðB þ jA 2 Þ . PðB þ Þ& PðB 2 jA 2 Þ . PðB 2 Þ

ð28Þ

While null evaluation (0,0) corresponds to a statistical independence as exhibited by expression (12). Similarly to the infinitesimal representation, the transitivity looks crucial for our reasoning. For this purpose, we shall say that event Y screens off event X with respect to event Z if the condition (29) below holds PðZ jX&Y Þ ¼ PðZ jY Þ and PðZ jX& : Y Þ ¼ PðZ j : Y Þ

ð29Þ

From this perspective, provided that probabilities of single events do not vanish, the following holds. Proposition 7. Given that A is a cause variable of an effect variable B, which is itself a cause variable of the effect variable C. Then assuming that the events pertaining to B screen off that of A with respect to that of C, then the transitivity rule holds. Proof. Using qualitative representation, the following equivalence holds. “if A i then B j ” , PðB j jA i Þ . PðB j Þ

ð30Þ

“if B j then C kj ” , PðC k jB j Þ . PðC k Þ

ð31Þ

Similarly,

To prove the transitivity, it suffices to show that P(C kjA i) . P(C k)

Bipolar logic and probabilistic interpretation 1367

K 34,9/10

Using total probability theorem, it holds PðC k jA i Þ ¼ PðC k jB j ; A i Þ · PðB j jA i Þ þ PðC k j : B j ; A i Þ · Pð: B j jA i Þ; which, using the hypothesis that B j screens off A i with respect to C k, i.e. PðC k jB j ;A i Þ ¼ PðC k jB j Þ and PðC k j : B j ; A i Þ ¼PðC k j : B j Þ; becomes

1368

PðC k jA i Þ ¼ PðC k jB j Þ · PðB j jA i Þ þ PðC k j : B j Þ · Pð: B j jA i Þ Similarly, PðC k Þ ¼ PðB j ÞPðC k jB j Þ þ PðC k j : B j ÞPð: B j Þ So, PðC k jA i Þ 2 PðC k Þ ¼ PðCk jB j Þ · PðB j jA i Þ þ PðCk j : B j Þ · Pð: B j j : A i Þ 2 PðB j ÞPðC k jB j Þ þ PðC k j : B j ÞPð: B j Þ ¼ PðCk jB j Þ½PðB j jA i Þ 2 PðB j Þ þ PðCk j : B j Þ½Pð: B j jA i Þ 2 Pð: B j Þ ¼ PðCk jB j Þ½PðB j jA i Þ 2 PðB j Þ þ PðCk j : B j Þ½PðB j Þ 2 PðB j jA i Þ ¼ ½PðB j jA i Þ 2 PðB j Þ½PðC k jB j Þ 2 PðC k j : B j Þ   Pð: B j jC k ÞPðC k Þ k j ; ¼ ½PðB jA Þ 2 PðB Þ PðC jB Þ 2 1 2 PðB j Þ j

i

j

which after some manipulations, becomes PðC k jA i Þ 2 PðC k Þ ¼

½PðB j jA i Þ 2 PðB j Þ½PðCk jB j Þ 2 PðC k Þ : 1 2 PðB j Þ

ð32Þ

The latter, under conditions (30) and (31) leads to a positive value, which completes the proof. The relationships of the connective ^ and qualitative representation is given by the following. A Proposition 8. Given the assumptions of Proposition 7 and the cognitive map interpretation of connective ^, the latter fully agrees with the qualitative probabilistic representation of the material implication. Proof. Assume that A is a cause variable of B, which is itself a cause variable of C. (1) To prove that ð0; 1Þ^ð0; 1Þ ¼ ð0; 1Þ; in view of definition (26) and the cause-effect variables, this comes down to showing that PðB þ jA þ Þ . PðB þ Þ & PðB 2 jA 2 Þ . PðB 2 Þ & PðB þ jA 2 Þ # PðB þ Þ & PðB 2 jA þ Þ # PðB 2 Þ; together with

PðC þ jB þ Þ . PðC þ Þ&PðC 2 jB 2 Þ . PðC 2 Þ&PðC þ jB 2 Þ # PðC þ Þ&PðC 2 jB þ Þ # PðC 2 Þ;

Bipolar logic and probabilistic interpretation

entail PðC þ jA þ Þ . PðC þ Þ&PðC 2 jA 2 Þ . PðC 2 Þ&PðC þ jA 2 Þ # PðC þ Þ&PðC 2 jA þ Þ # PðC 2 Þ Clearly using transitivity rule pointed out in Proposition 7, it is straightforward that under conditions of Proposition 7, i.e. PðC k jB j ; A i Þ ¼ PðC k jB j Þ & PðC k j : B j ; A i Þ ¼ PðC k j : B j Þ (where k, j, i stand for either þ or 2 operators), from P(B þ jA þ ) . P(B þ ) and P(C þ jA þ ) . P(C þ ), it yields P(C þ jA þ ) . P(C þ ). Similarly, P(B 2 jA 2 ) . P(B 2 ) and P(C 2 jB 2 ) . P(C 2 ) yield P(C 2 jA 2 ) . P(C 2 ) On the other hand using (32), it holds that PðC 2 jA þ Þ 2 PðC 2 Þ ¼

½PðB þ jA þ Þ 2 PðB þ Þ½PðC 2 jB þ Þ 2 PðC 2 Þ 1 2 PðB þ Þ

Consequently from PðB þ jA þ Þ . PðB þ Þ and PðC 2 jB þ Þ # PðB þ Þ, it yields PðC 2 jA þ Þ # PðC 2 Þ Similarly, from PðB 2 jA 2 Þ . PðB 2 Þ and PðC þ jB 2 Þ # PðB 2 Þ; it yields PðC þ jA 2 Þ # PðC þ Þ (2) To prove ð21; 0Þ^ð21; 0Þ ¼ ð0; 1Þ; one needs to show that PðB þ jA 2 Þ . PðB þ Þ & PðB 2 jA þ Þ . PðB 2 Þ & PðB þ jA þ Þ # PðB þ Þ & PðB 2 jA 2 Þ # PðB 2 Þ and PðC þ jB 2 Þ . PðC þ Þ & PðC 2 jB þ Þ . PðC 2 Þ & PðC þ jB þ Þ # PðC þ Þ & PðC 2 jB 2 Þ # PðC 2 Þ entail PðC þ jA þ Þ . PðC þ Þ & PðC 2 jA 2 Þ . PðC 2 Þ & PðC þ jA 2 Þ# PðC þ Þ & P ðC 2 jA þ Þ# PðC 2 Þ Clearly, using (32), P(B þ jA 2 ) . P(B þ ) and P(C 2 jB þ ) . P(C 2 ) entail P(C 2 jA 2 ) . P(C 2 ) Similarly, P(B 2 jA þ ) . P(B 2 ) and P(C þ jB 2 ) . P(C þ ) entail P(C þ jA þ ) . P(C þ ) On the other hand, P(B þ jA þ ) # P(B þ ) and P(C 2 jB þ ) . P(C 2 ) entail P(C 2 jA þ ) # P(C 2 ) And, P(B þ jA þ ) # P(B þ ) together with P(C 2 jB þ ) . P(C 2 ) yield P(C þ jA þ ) . P(C þ )

1369

K 34,9/10

(3) To prove ð21; 0Þ^ð0; 1Þ ¼ ð21; 0Þ; one needs to show PðB þ jA 2 Þ . PðB þ Þ & PðB 2 jA þ Þ . PðB 2 Þ & PðB þ jA þ Þ # PðB þ Þ & PðB 2 jA 2 Þ # PðB 2 Þ and

1370

PðC þ jB þ Þ . PðC þ Þ & PðC 2 jB 2 Þ . PðC 2 Þ & PðC þ jB 2 Þ # PðC þ Þ & PðC 2 jB þ Þ # PðC 2 Þ yield PðC þ jA 2 Þ . PðC þ Þ & PðC 2 jA þ Þ . PðC 2 Þ & PðC þ jA þ Þ # PðC þ Þ & PðC 2 jA 2 Þ # PðC 2 Þ Again the use of relation (32) shows that from PðB þ jA 2 Þ. PðB þ Þ and PðC þ jB þ Þ. PðC þ Þ it entails PðC þ jA 2 Þ. PðC þ Þ: While PðB 2 jA þ Þ. PðB 2 Þ and PðC 2 jB 2 Þ. PðC 2 Þ entail PðC 2 jA þ Þ. PðC 2 Þ On the other hand from PðC 2 jB 2 Þ. PðC 2 Þ and PðB 2 jA 2 Þ# PðB 2 Þ; one gets PðC 2 jA 2 Þ# PðC 2 Þ; and from PðC þ jB þ Þ. PðC þ Þ and þ þ þ þ þ þ PðB jA Þ# PðB Þ; one gets PðC jA Þ# PðC Þ: (4) To prove ð21; 0Þ^ð0; 0Þ ¼ ð0; 0Þ; ð1; 0Þ^ð0; 0Þ ¼ ð0; 0Þ; ð21; 1Þ^ð0; 0Þ ¼ ð0; 0Þ; it suffices to notice that as far as the expression (32) is concerned, evaluation (0,0) entails a relation in the form PðC k jB j Þ 2 PðC k Þ ¼ 0; (for k, j stand for either þ or 2), then it always occurs that PðC k jA i Þ 2 PðC k Þ ¼ 0 (where k, i take either þ or 2 ). (5) To prove ð21; 1Þ^ð0; 1Þ ¼ ð21; 1Þ; one needs to show that PðB þ jA þ Þ . PðB þ Þ & PðB 2 jA þ Þ . PðB 2 Þ & PðB þ jA 2 Þ . PðB þ Þ & PðB 2 jA 2 Þ . PðB 2 Þ; and PðC þ jB þ Þ . PðC þ Þ & PðC 2 jB 2 Þ . PðC 2 Þ & PðC þ jB 2 Þ # PðC þ Þ & PðC 2 jB þ Þ # PðC 2 Þ entail PðC þ jA þ Þ . PðC þ Þ & PðC 2 jA þ Þ . PðC 2 Þ & PðC þ jA 2 Þ . PðC þ Þ & PðC 2 jA 2 Þ . PðC 2 Þ

Using (32), we have . P(B þ jA þ ) . P(B þ ) and P(C þ jB þ ) . P(C þ ) yield P(C þ jA þ ) . P(C þ ) . P(B 2 jA 2 ) . P(B 2 ) and P(C 2 jB 2 ) . P(C 2 ) yield P(C 2 jA 2 ) . P(C 2 )

. .

P(B þ jA 2 ) . P(B þ ) and P(C þ jB þ ) . P(C þ ) yield P(C þ jA 2 ) . P(C þ ) P(B 2 jA þ ) . P(B 2 ) and P(C 2 jB 2 ) . P(C 2 ) yield P(C 2 jA þ ) . P(C 2 )

Notice that other combinations of the initial two sets of inequalities can also provide

Bipolar logic and probabilistic interpretation

PðC þ jA þ Þ . PðC þ Þ & PðC 2 jA 2 Þ . PðC 2 Þ & PðC þ jA 2 Þ # PðC þ Þ & PðC 2 jA þ Þ

1371

# PðC 2 Þ Another combination can also lead to PðC þ jA 2 Þ . PðC þ Þ & PðC 2 jA þ Þ . PðC 2 Þ & PðC þ jA þ Þ # PðC þ Þ & PðC 2 jA 2 Þ # PðC 2 Þ This reinforces the conjuncture that in case of (21,1) evaluation, both (0,1) and (2 1,0) evaluations are held at the same time. The same reasoning does apply to prove ð21; 1Þ^ð21; 0Þ ¼ ð21; 1Þ; as well as for ð21; 1Þ^ð21; 1Þ ¼ ð21; 1Þ: The detail is omitted. From results pointed out in Propositions 7 and 8, one notices the following . The condition pointed out in Proposition 7, i.e. events pertaining to B screens off A with respect to C, is only a sufficient condition in the sense that one may find several other cases in which the transitivity holds without the condition of B screens off A with respect to C. Indeed, for instance, another case, which entails transitivity, is where the conditional probabilities are greater than unconditional ones in the sense that P(C kjB j ,A i) . P(C kjB j ), P(B j jA i, . P(B j ), P(C kj : B j,A i) . P(C kj : B j) and P( : B jjA i) . P( : B j). Consequently, the results pointed out in Proposition 8 may also hold valid even if condition of Proposition 7 is non-valid. . It should be noted that the proofs carried out in Propositions 7 and 8 are deemed valid only if single probabilities of events pertaining   to A1, A2 and B do not vanish. Indeed from expression, for instance, P B i jAj1 . PðB i Þ; where i, j stand

for either þ or 2 , the condition of PðB i Þ . 0 is pre-requisite for the validity of the above expression. In other words, the condition of PðB i Þ . 0 is necessary for the of the qualitative representation itself. While the condition  meaningfulness   

P Ai1 . 0 is a pre-requisite for the validity of Bayes’ rule. In case of P Ai1 ¼

0; an alternative scenario is to use the pseudo Bayes’ formula ( PðY > XÞ=PðXÞ if PðXÞ . 0 PðY jXÞ ¼ 1 if PðXÞ ¼ 0

ð33Þ

In this course, it is easy to see that in case of Proposition 7, if the probability pertaining to A vanishes ðPðB j Þ . 0 is a pre-requisite for the qualitative representation “if B j then C k”), then a necessary condition for the transitivity rule is to keep PðC k Þ , 1: Similarly, in case of Proposition 8, it is easy to see that

K 34,9/10 .

1372 .

.

.

Figure 4. Alternative representation of ^ connective

    if either P Ai1 or P Ak2 vanish, then PðB j Þ , 1 becomes a necessary condition. Using Bayes’ theorem, it is easy to see that if PðCjB; AÞ ¼ PðCjBÞ; then it also holds PðAjB; CÞ ¼ PðAjBÞ and similarly, PðCj : B; AÞ ¼ PðCj : BÞ entails PðAj : B; CÞ ¼ PðAj : BÞ: In other words, if B screens off A with respect to C, then also B screens off C with respect to A. For example, the above relation is somehow symmetric with respect to extreme variables A and C. The meaning of the condition pointed out in Proposition 7 means that the correlation between A and C disappears when B is held fix either positively or negatively. In other words, B is the only factor involved that has causal influence on A and C. On the other hand, this condition also involves Markov property as pointed out by Eells (1991) in the sense that the state of the system depends on what happens at just previous state and not on what happens before that so that as B is the previous state of C then P(CjB,C) is dependent only on event pertaining to B, i.e. P(CjB). Furthermore, as the relation “screens off” is symmetric as pointed out previously, the Markov property is by abuse also understood for future events in the sense that the current state only depends on the next successor not on all successors, which explains the equality P(AjB,C) ¼ P(AjB). This condition has also been explored earlier on by Suppes (1970) and Simon (1954) called spurious correlation (between two factors) when roughly neither causes the other and the correlation disappears when a third factor is introduced and held fixed. Simon (1954) pointed out that if the correlation between A and C disappears both in the presence and the absence of a third factor, say, B, then the explanation may be either that the correlation results from the joint causal effect of B on A and C, i.e. B is a common cause of A and C, or B is an intermediate causal factor between A and C. In the course of Simon’s view, the approach developed here corresponds rather to the second case where B is an intermediate level between A and C, which agrees with representation of Figure 2(a). However, if the first interpretation was considered, then the counterpart of Figure 2(a) would be Figure 4 shown below in which B acts as a common cause for both A and C, and the result of connective ^ would be exhibited through the correlation between A and C Reinterpreting the pairs (0,1), (2 1,0) and (0,0) as positive, negative and neutral causal significance in probabilistic sense creates some analogy with Eells’s work. From this perspective, it is worth noticing that if the composition connective ^ is considered, then the negative-positive element (2 1,1) cannot be represented. However, all remaining cases (0,0), (0,1), (2 1,0) can be appropriately interpreted in terms of neutral, positive and negative causation. That is, PðBjAÞ ¼ PðBj : AÞ;

.

.

.

.

.

.

PðBjAÞ . PðBj : AÞ and PðBjAÞ , PðBj : AÞ correspond to (0,0), (0,1) and (2 1,0) respectively. Also, the result in terms of causation A-C given A-B and B-C causations coincide with that provided in Proposition 8. Another dilemma when contrasting Eells’s causal approach with our approach consists in the fact that the relation of causation is by nature asymmetric in the sense that if A is a cause of B, then B cannot cause A; that is, if A is a probability-increasing cause of B, then B would be the probability-increasing non-cause of A. However, when one sticks to rule-based interpretation, nothing prevent the relation to be symmetric in the sense that A entails B as well as B may entail A. This observation shows some superiority of the rule-based interpretation when dealing with negative-positive valuation where both positive and negative valuation hold at the same time. Notice that in the interpretation of the material implication as “if A i then B j” , PðB j jA i Þ. PðB j Þ; if the converse implication is considered in the sense that one requires “if A i then B j” and not “if B j then A i”, so that the new interpretation becomes “if A i then B j” , PðB j jA i Þ. PðB j Þ and PðA i jB j Þ# PðA i Þ; then the condition mentioned in Proposition 7 will not be sufficient to ensure the transitivity rule. Indeed, using the same reasoning as that carried out in proof of Proposition 7, and using the symmetry pointed out earlier of screening off relation, it is straightforward that PðB j jA i Þ . PðB j Þ&PðA i jB j Þ # PðA i Þ and PðC k jB j Þ . PðC k Þ&PðB j jC k Þ # PðB j Þ would lead to PðC k jA i Þ . PðC k Þ&PðA i jC k Þ $ PðA i Þ: However, it is easy to see that if either of the initial implications is symmetric in the sense that either PðB j jC k Þ # PðB j Þ or PðA i jB j Þ # PðA i Þ hold, then it yields PðC k jA i Þ . PðC k Þ&PðA i jC k Þ # PðA i Þ; which means ensures the transitivity. From the preceding, when compared with threshold based representation, it should be noted that the latter implicitly accounts for asymmetrical relation in the sense that one explicitly mention, for instance that B ! A does not hold, which shows some strength when compared to the qualitative representation. From the proof of Proposition 8, notice that when the valuation (2 1,1) is concerned while the other operand is not a null evaluation, the result is not unique, and rather all (0,1), (2 1,0) and (2 1,1) were possible. This also reinforces the conjuncture pointed out in the definition of bipolarity (2 1,1) where both polarity hold at the same time. From Proposition 8, it also turns out that the properties of connective ^ are still preserved even when using qualitative probabilistic interpretation; that is, ^ is commutative and has (0,0) as an absorption element. If more than two elements were involved, it is also easy to see that the condition of Proposition 7 can easily be extended. Indeed, assume for instance that three elements were involved. This would entail four variables A, B, C and D represented as follows: A!B!C!D

So it is easy to see that an implicit condition for associativity would be that, when reasoning in terms of events pertaining to A, B, C, D then

Bipolar logic and probabilistic interpretation 1373

K 34,9/10

. . . .

1374

B screens off A with respect to C B screens off A with respect to D C screens off B with respect to D C screens off A with respect to D

Alternatively, using the symmetry pointed out earlier on, this is equivalent to B screens off C with respect to A B screens off D with respect to A C screens off D with respect to B C screens off D with respect to A In other words, when a chain is concerned, then each node within the extreme nodes of the chain screens off each of its predecessors with respect to each of its successor. A 4.3 Interpretation of disjunctive connective In order to deal with disjunctive connective % using qualitative representation, sticking on the cognitive map interpretation ascribed to %, let us first point out the following assertion. Given A1 and A2 be the cause variable of the same effect variable B, then, reasoning in terms of events pertaining to A1, A2 and B, one denotes by H0 the following hypothesis H0: If minðPðBjA1 Þ; PðBjA2 ÞÞ # PðBÞ and maxðPðBjA1 Þ; PðBjA2 ÞÞ . PðBÞ; then PðBjA1 Þ . ½PðBÞ2 =PðBjA2 Þ Notice that hypothesis H0 also entails that minðPðBjA1 Þ; PðBjA2 ÞÞ . ½PðBÞ2 Proposition 9. Given that both A1 and A2 are cause variables of the same effect variable B, and given the statistical independence of events pertaining to A1 and A2 and conditional independence of the events of A1 and A2 given the event ascribed to B, then provided the hypothesis H0 holds and the probabilities of single events do not vanish, (1) fully agrees with the qualitative representation. Proof. (1) To prove that ð0; 1Þ^ð0; 1Þ ¼ ð0; 1Þ; from (26) and the cognitive map interpretation of %, one needs to prove that 8 þ þ þ 2 2 2 < PðB jA1 Þ . PðB Þ & PðB jA1 Þ . PðB Þ and : PðB þ jA2 Þ # PðB þ Þ & PðB 2 jAþ Þ # PðB 2 Þ 1 1 8 þ þ þ 2 2 2 < PðB jA2 Þ . PðB Þ & PðB jA Þ . PðB Þ : PðB þ jA2 Þ # PðB þ Þ & PðB 2 jAþ Þ # PðB 2 Þ 2

entail

;

2

8 þ þ 2 2 2 2 < PðB þ jAþ 1 ; A2 Þ . PðB Þ & PðB jA1 ; A2 Þ . PðB Þ þ þ 2 þ 2 2 : PðB þ jA2 1 ; A1 Þ # PðB Þ & PðB jA1 ; A2 Þ # PðB Þ

ð34Þ

Clearly using Bayes’ theorem and statistical independence of events pertaining to A1 and A2 as well conditional independence given B, we have þ PðB þ jAþ 1 ; A2 Þ ¼

¼

þ þ þ þ PðB þ > Aþ PðAþ 1 > A2 Þ 1 > A2 jB Þ · PðB Þ ¼ þ þ þ þ PðA1 > A2 Þ PðA1 ÞPðA2 Þ þ þ þ þ þ þ PðAþ PðB þ jAþ 1 jB ÞPðA2 jB Þ · PðB Þ 1 ÞPðB jA2 Þ ¼ þ PðB þ Þ PðAþ 1 ÞPðA2 Þ

More generally, it also holds 



P B i jAk1 ; Ak2 ¼

    P B i jAk1 P B i jAk2 PðB i Þ

;

ð35Þ

where i, k stand for either þ or 2 , þ þ þ þ Consequently, from PðB þ jAþ 1 Þ. PðB Þ and PðB jA2 Þ. PðB Þ; it is þ þ þ þ straightforward that PðB jA1 ;A2 Þ . PðB Þ: þ 2 2 2 2 2 Similarly, as PðB 2 jA2 1 ; A2 Þ ¼PðB jA1 ÞPðB jA2 Þ=PðB Þ; so from 2 2 2 2 2 2 2 2 PðB jA1 Þ . PðB Þ and PðB jA2 Þ . PðB Þ; it follows PðB j A2 1 ; A2 Þ . 2 PðB Þ: þ 2 þ þ 2 þ þ 2 þ From PðB þ jA2 1 ; A2 Þ ¼PðB jA1 ÞPðB jA2 Þ=PðB Þ; PðB jA1 Þ # PðB Þ þ þ 2 2 þ Þ # PðB Þ; it follows PðB jA ; A Þ # PðB Þ: and PðB þ jA2 2 1 2 þ 2 Using similar reasoning to the preceding, it entails PðB 2 jAþ 1 ;A2 Þ # PðB Þ (2) Same reasoning applies to proveð21; 0Þ%ð21; 0Þ ¼ ð21; 0Þ: It suffices to use equation (35) and inequalities supplied by the two operands. (3) To prove ð0; 1Þ%ð0; 0Þ ¼ ð0; 1Þ or ð21; 0Þ%ð0; 0Þ ¼ ð21; 0Þ: For the first case, one need to show that 8 þ 2 2 2 < PðB þ jAþ 1 Þ . PðB Þ & PðB jA1 Þ . PðB Þ and PðB i jAk2 Þ ¼ PðB i Þ; þ þ 2 2 : PðB þ jA2 1 Þ # PðB Þ & PðB jA1 Þ # PðB Þ i; k ¼ þ; 2; entails 8 þ þ 2 2 2 2 < PðB þ jAþ 1 ; A2 Þ . PðB Þ & PðB jA1 ; A2 Þ . PðB Þ þ þ 2 þ 2 2 : PðB þ jA2 1 ; A2 Þ # PðB Þ & PðB jA1 ; A2 Þ # PðB Þ

Clearly, again the result is also straightforward from (35) and the above þ þ þ þ þ þ inequalities. For instance from PðB þ jAþ 1 ; A2 Þ ¼PðB jA1 ÞPðB jA2 Þ=PðB Þ; þ þ þ þ þ þ þ þ þ PðB jA1 Þ . PðB Þ and PðB jA2 Þ ¼ PðB Þ; it follows PðB jA1 ; A2 Þ . 2 þ 2 þ 2 þ And from PðB þ jA2 PðB þ Þ: 1 ; A2 Þ ¼PðB jA1 ÞPðB jA2 Þ=PðB Þ; þ 2 þ þ 2 þ PðB jA1 Þ # PðB Þ and PðB jA2 Þ ¼ PðB Þ; we have 2 þ PðB þ jA2 ;A Þ # PðB Þ: Similar reasoning applies for other cases. 1 2 (4) Similar reasoning applies to show ð21; 1Þ%ð21; 1Þ ¼ ð21; 1Þ: The detail is omitted.

Bipolar logic and probabilistic interpretation 1375

K 34,9/10

1376

   show ð0; 0Þ%ð0; 0Þ ¼ ð0; 0Þ; noticethat from (35) and P B i jAk1 ¼ PðB i Þ; (5) To P ðB i jAk2 ¼ PðB i Þ; it follows that P B i jAk1 ; Ak2 ¼PðB i Þ: (6) To prove ð21; 0Þ%ð0; 1Þ ¼ ð21; 1Þ; one needs to make use of hypothesis 2 þ þ 2 H0. Indeed, for instance, PðB þ jA2 1 ;A2 Þ . PðB Þ is entailed from PðB jA1 Þ . þ þ 2 þ PðB Þ and PðB jA2 Þ # PðB Þ; and as the condition in H0 holds, þ 2 þ 2 2 Consequently, PðB þ jA2 so PðB þ jA2 1 Þ · PðB jA2 Þ . ½PðB Þ : 1 ; A2 Þ ¼ þ 2 þ 2 þ þ ðPðB jA1 Þ · PðB jA2 Þ=PðB Þ . PðB Þ: Similarly, using H0, it is þ þ þ þ 2 þ straightforward PðB þ jAþ 1 ; A2 Þ . PðB Þ; PðB jA1 ; A2 Þ . PðB Þ and 2 þ ; A Þ . PðB Þ: PðB 2 jA2 1 2 (7) Same reasoning as the preceding can be used to proveð0; 1Þ%ð21; 1Þ ¼ ð21; 1Þ: The detail is omitted. Finally, we shall notice that the condition of non-zero elements of P(B i) is a necessary condition for the meaningfulness of the qualitative representation, while the conditions of PðAi1 Þ . 0 and PðAj2 Þ . 0 are necessary conditions for Bayes’ rule, where i, j stand for either þ or 2 . . The preceding shows that, when the valuation (2 1,1) is not involved, conditional and statistical independence conditions are enough to ensure the agreement between the cognitive map interpretation and expression (2) in the light of qualitative representation. However, when the valuation (2 1, 1) matters, the hypothesis H0 is required to recover such agreement. . As compared to the infinitesimal representation, notice that the counterpart of that hypothesis is the principle of insufficient reason, which forces the result to uniform probability. Nevertheless, the condition H0 sounds much more constrained than the principle of insufficient reason, which is rather natural in the absence of further evidence and well know common in statistics. . The results pointed out in Proposition 9 show that binary properties of % are still preserved like commutativity, (0,0) is a neutral element, (2 1,1) is an absorption element. . The result pointed out in Proposition 9 is only for two operands, however, such result can easily be expanded. Indeed, in case of three operands for instance, which involves three cause variables A1, A2, A3 and one common effect variable, one requires: all pairs of events pertaining to A1, A2 and A3 are statistically independent and conditionally independent given event pertaining to B. On the other hand, the hypothesis H0 needs to be held for each pair of cause variables. H00 If minðPðBjAi Þ; PðBjAj ÞÞ# PðBÞ and maxðPðBjAi Þ; PðBjAj ÞÞ . PðBÞ; then

PðBjAi Þ .

with i, j ¼ 1; 2; 3 and i – j.

½PðBÞ2 ; PðBjAj Þ

.

.

.

The preceding allows us to recover the associativity property, and provide modularity in handling large number of operands sequentially. In case where PðAi1 Þ ¼ 0 and PðAj2 Þ ¼ 0 and using the pseudo-Bayes’ formula (33), it is easy to see that a necessary condition to keep the result of Proposition 9 valid is to ensure that the condition P(C k) , 1 holds, where i, j, k stand for either þ or 2 . Clearly, the above condition contrasts with infinitesimal representation where no condition regarding range of P(C k) is required. It should also be noted that Eells and Sobers (1983) and Eells (1991) have pointed out that in cases where several causes act on the same effect variable, then a sufficient condition that ensures transitivity which supplement the usual condition of connectedeness or Markovian property is the pre-request that events pertaining to different Ai are mutually independent given event pertaining to B, and given event pertaining to complement of B. From this perspective, the result pointed out in Proposition 9 is rather less demanding in the sense that the mutual conditional independence with respect to complement of event ascribed to B is not a pre-requisite to our result.

Bipolar logic and probabilistic interpretation 1377

A 5. Use of functional dependency Instead of using a probabilistic interpretation of logical rule based system and instead of sticking to cognitive-map based interpretation, an alternative idea to capture the properties of connectives % and ^ is to assume that event A, B are independent but the paths leading to C are rather combined via some functional g that best describes the properties of % and ^ as exhibited in Figure 5. In this course, A, B and C would represent the random variables pertaining to the two operands (in the case of binary operation) and the outcome, respectively. Therefore A, B and C take values in S ¼ {ð21; 0Þ; ð0; 0Þ; ð0; 1Þ; ð21; 1Þ}: Let the values of A, B, C be ai, bi, and ci ði ¼ 1 to 4) respectively, where a1 ¼ ð21; 0Þ; a2 ¼ ð0; 0Þ and a3 ¼ ð0; 1Þ and a4 ¼ ð21; 1Þ (similar reasoning applies to bi and ci). It should be noted that if the above pairs are mapped to their respective integer values, then it holds that a1 , a2 , a3 ; ai ¼ bi ¼ ci for i ¼ 1 to 3 (since 21 , 0 , 1; while (2 1,1) cannot be decomposed into single entity). The preceding defines a linear ordering among the values of stochastic variables A, B, C, which is not complete as the value (2 1,1) is not comparable. An alternative ordering can be obtained by considering the presence of effect/counter-effect, from this perspective it holds that a2 , a1 , a4 and a2 , a3 , a4 : Now from expressions (1) and (2), one can entail the following expressions for functionals g1 and g2 that quantify the outcomes of connectives % and ^, respectively: Figure 5. New graphical interpretation with causal dependence described via function g

K 34,9/10

g1 ðA; BÞ ¼

1378

8 c4 > > > > > > > > > > > > > > > > > < c3 > > > > > > c2 > > > > > > c1 > > > > > :

if

½A ¼ a4 or B ¼ b4 

or ½A ¼ a3 and B ¼ b1  or ½A ¼ a1 and B ¼ b3  if ½A ¼ a3 and ðB ¼ b3 or B ¼ b2 Þ or ½B ¼ b3 and ðA ¼ a3 or A ¼ a2 Þ

ð36Þ

if A ¼ a2 and B ¼ b2 if ½A ¼ a1 and ðB ¼ b1 or B ¼ b2 Þ or ½B ¼ b1 and ðA ¼ a2 or A ¼ a1 Þ

and

g 2 ðA; BÞ ¼

8 c4 > > > > > > > > > > > > c3 > > < > > > c2 > > > > > > > > c1 > > > :

if ½A ¼ a4 and B ¼ b1 or B ¼ b3 or B ¼ b4  or ½B ¼ b4 and A ¼ a1 or A ¼ a3 or A ¼ a4  if ½A ¼ a3 and B ¼ b3  or ½A ¼ a1 and B ¼ b1 

ð37Þ

if A ¼ a2 or B ¼ b2 if ½A ¼ a1 and B ¼ b3  or ½A ¼ a3 and B ¼ b1 

Using the total probability theorem, it holds that P g ðcÞ ¼

X

PðcjA; BÞPðA; BÞ;

ð38Þ

A;B

Strictly speaking the functions g1 and g2 characterize the causal independence that determine the structure pertaining to connectives % and ^, respectively. In other words, instead of assuming the initial events causally dependent with some dependence structure to be defined, one assumes that the initial events are rather independent but whose outputs are combined via some deterministic function that fully characterize the previous dependency. In the same spirit as noisy-or and noisy-and model developed in the literature of probabilistic causal reasoning, see, for instance Srinivas (1993) and Lucas (2001) and incorporating the above functional into probabilistic models yield for g, and assuming A and B independent, P(cjA, B) can be defined as ( PðcjA; BÞ ¼

1

if gðA; BÞ ¼ c

0

otherwise

ð39Þ

Substituting equation (39) in equation (38) leads to X PðAÞPðBÞ P g ðcÞ ¼

ð40Þ

gðA;BÞ¼c

for g ¼ g 1 or g2. Therefore, using (38), (40) leads, for output c1, after some manipulations that cope with redundant events, to P g1 ðc1 Þ ¼ Pða1 ÞðPðb1 Þ þ Pðb2 ÞÞ þ Pðb1 ÞðPða2 Þ Notice that 4 X

Pðai Þ ¼

i¼1

4 X

Pðbi Þ ¼ 1:

i¼1

Similarly, one obtains P g1 ðc2 Þ ¼ Pða2 ÞPðb2 Þ P g1 ðc3 Þ ¼ Pða3 ÞðPðb3 Þ þ Pðb2 ÞÞ þ Pðb3 ÞPða2 Þ P g1 ðc4 Þ ¼ Pða4 Þ þ Pðb4 Þ 2 Pða4 ÞPðb4 Þ þ Pða3 ÞPðb1 Þ þ Pða1 ÞPðb3 Þ Now using functional g2, we have similarly P g2 ðc1 Þ ¼ Pða1 ÞPðb3 Þ þ Pða3 ÞPðb1 Þ: P g1 ðc2 Þ ¼ Pða2 Þ þ Pðb2 Þ 2 Pða2 ÞPðb2 Þ P g1 ðc3 Þ ¼ Pða3 ÞPðb3 Þ þ Pða1 ÞPðb1 Þ P g2 ðc4 Þ ¼ Pða4 Þð1 2 Pðb2 ÞÞ þ Pðb4 Þð1 2 Pða2 ÞÞ 2 Pða4 ÞPðb4 Þ .

Notice that one can easily check that 4 X

P g1 ðci Þ ¼ 1

i¼1

and 4 X

P g2 ðci Þ ¼ 1

i¼1

which is trivial as the ci form a complete partition of the space.

Bipolar logic and probabilistic interpretation 1379

K 34,9/10

.

Clearly, if one assumes that all individuals are equally probable, i.e. Pðai Þ ¼ Pðbi Þ ¼ 1=4; then it is easy to see that P g1 ðc2 Þ , P g1 ðc1 Þ ¼ P g1 ðc3 Þ , P g1 ðc4 Þ and P g2 ðc1 Þ ¼ P g2 ðc3 Þ , P g2 ðc4 Þ , P g2 ðc1 Þ

1380

5.1 Generalization and sensitivity analysis Generalizing the above result to more than two operands; that is, given operands a i (i ¼ 1 to n) taking values in S, say, aij where j ¼ 1 to 4, i.e. ai1 ¼ ð21; 0Þ; ai2 ¼ ð0; 0Þ; ai3 ¼ ð1; 0Þ; ai4 ¼ ð21; 1Þ; and let c be the outcome of the combination of these operands via the underlying connective. We also denote by T a set of indices whose cardinality is n and taking values in {1,2,3,4}. For instance, n ¼ 5; T ¼ {1; 1; 3; 2; 3} would refer to the combination of operands a11 ;a21 ;a33 ;a42 ;a53 : Therefore, it is easy to check that P Q i Q   Pðaj Þ þ P ai1 ; with (1) P g1 ðc1 Þ ¼ j[T i¼1;n

i¼1;n

jTj ¼ n; T taking values in 1; 2 and {1,2} , T. The above follows from the fact that when the connective % is concerned, (2 1, 0) is obtained when at least one of the operands is (2 1,0) and the remainders are null element (0,0), or all the operands are (2 1,0) valued. Q (2) P g1 ðc2 Þ ¼ Pðai2 Þ i¼1;n

The preceding follows from the fact that (0,0) is obtained as a result of combination of all operands via connective % only if all these operands are (0,0) valued. In other words, P g1 (c2) acts as a product rule over all operands ai2 : P Q i Q Pðaj Þ þ Pðai3 Þ; (3) P g1 ðc3 Þ ¼ j[T i¼1;n

i¼1;n

with jTj ¼ n; T taking values in {2; 3} and {2,3} [ T P i Q i P Q  i P aj ; (4) P g1 ðc4 Þ ¼ Pða4 Þ2ðn 2 1Þ Pða4 Þ þ i¼1;n

i¼1;n

j[T i¼1;n

with jTj ¼ n and T takes values in {1; 2; 3} and {1; 3} , T; This follows as the valuation (21,1) is a result of the combination of the operands aij via connective % if at least one of these operand is valued (2 1,1) or there is a coexistence of two operators taking (2 1,0) and (0,1) respectively. The subtraction part in the above expression is due to the fact that the events pertaining to co-occurrence of events pertaining to ai4 (i ¼ 1 to n) is redundant in the events involved in the definition of the marginal probabilities Pðai4 Þ: Notice that P g1 ðc1 Þ ¼

X Y   X Y   P aij 2 P g1 ðc2 Þ and P g1 ðc3 Þ ¼ P aij 2 P g1 ðc2 Þ: j¼1;2 i¼1;n

j¼2;3 i¼1;n

Regarding the sensitivity of the operation with respect to number of operands, notice that if a new operand aij has been added, which makes the number of arguments n0 ¼ n þ 1; then . The probability of the outcome does not follow a systematic move. For instance it is easy to see from expression in (ii) that P g1 (c2) always decreases with respect to the number of operands Y   Y   P ai2 # P ai2 : i¼1;nþ1

.

i¼1;n

However, in the case of P g1 ðc1 Þ; P g1 ðc3 Þ; P g1 ðc4 Þ the values of initial probabilities affect the behavior  of the outcome.  nþ1 For instance in the case of P g1 ðc1 Þ; it is easy to see that if P anþ1 $ 1 2 P a1 ; then P g1 ðc1 Þ increases with the number of 2 operands. The detail is omitted. But in general case the general behavior cannot be easily predicted. In the  case  where  the individual   probabilities are uniform in the sense that P ai1 ¼ P ai2 ¼ P ai3 ¼ P ai4 ¼ 1=4; for each i, then it is easy to check that in this specific situation, P g1 ðc1 Þ; P g1 ðc3 Þ and P g1 ðc4 Þ decrease with the number of arguments. P g1 ðc2 Þ does so in all situations.

Similarly, one obtains for g2, the following P Q i Pðaj Þ; with jTj ¼ n; T taking values in {1,3} and jT 1 j ¼ (1) P g1 ðc1 Þ ¼ j[T i¼1;n

2 · k þ 1; for some integer k $ 0; where jT i j represents the number of “i” in the sequence T. The preceding translates the fact that if the number of negative elements (21,0) is an odd number in any multiplication of operands aij ; which can only take (21,0) or (0,1) values, then the result is always (2 1,0) since the multiplication of 2k elements with (21,0) leads to (0,1) and (0,1) is a neutral element for the multiplication or connective ^, which yields final result (2 1,0). P   Q   (2) P g1 ðc2 Þ ¼ P ai2 2ðn 2 1Þ P ai2 i¼1;n i¼1;n P Q  i (3) P g1 ðc3 Þ ¼ P aj ;with jTj ¼ n; T taking values in {1,3} and jT 1 j ¼ 2k; j[T i¼1;n

for some integer k $ 0; where jT i j represents the number of “i” in the sequence T. P Q i Pðaj Þ;with jTj ¼ n and T takes values in {1,3,4} and {4} , T (4) P g1 ðc4 Þ ¼ j[T i¼1;n

This follows from the fact that as soon as one of the operands is (2 1,1) valued and none of the other operands is zero valued, then the result of the combination via connective ^ is (21,1) valued. In terms of sensitivity of the above probabilities to the number of operands, one should notice that similarly to the function g1, there is no generic behavior which holds true for all situations as this is highly dependent on individual   probabilities. However, # 1=2; then P g1 ðc2 Þ is in the case of P g1 ðc2 Þ; one can easily check that if P anþ1 2 increasing with the number of operands.

Bipolar logic and probabilistic interpretation 1381

K 34,9/10

1382

On the other hand, if individual probabilities are uniform, then it is easy to see that both P g1 ðc1 Þ; P g1 ðc3 Þ and P g1 ðc4 Þ are non-increasing with respect to the number of arguments. 6. Conclusion This paper provides a new look to NPN bipolar logic motivated by cognitive map representation. Particularly, a set of rational assumptions have been put forward in order to provide a mathematical framework that justify the conjunction and disjunction operations % and ^ pointed out in NPN bipolar logic. Furthermore, an extension of this framework to probabilistic setting has been provided. For this purpose, a cognitive map model has been put forward. The probabilistic setting extends the probabilistic interpretation of material implication in such a way that the cognitive interpretation is preserved. Two different probabilistic interpretations have been investigated. The former is based on the infinitesimal representation, which assumes the conditional probability of the effect variable conditional on the cause variable is high or close to one. While the latter is based on the qualitative representation, which basically extends the positive dependency behaviour. In both cases sufficient conditions, that ensure transitivity of the inference and the coherence with respect to conjunctive and disjunctive connective in NPN logic, are pointed out. Finally, instead of finding the causal dependency structure, an alternative idea is rather to assume there is no causal dependence, but instead, the individuals are independent and combined through some functional that fully reproduce the behaviour of the % and ^ connectives in NPN logic. The latter alternative implicitly omits the cognitive-map based interpretation. The methodology developed in this paper presents an appealing basis to be extended to probabilistic argumentation systems in which the pro and cons arguments are critical (Dung, 1995; Haenni, 1998). Note 1. The word crisp is used here to distinguish the case where a and b take their values in {2 1,0,1} from the case where they may take any value in [2 1,1]. References Adam, E. (1975), Logic of Conditionals, Reidel, Dordrecht. Axelrod, R. (1976), Structure of Decision, Princeton University Press, Princeton, NJ. Cartwright, N. (1979), “Causal laws and effective strategies”, Nous, Vol. 13, pp. 419-37. Eells, E. and Sobers, E. (1983), “Probabilistic causality and the question of transitivity”, Philosophy of Science, Vol. 50 No. 1, pp. 5-57. Eells, E. (1991), Probabilistic Causality, Cambridge University Press, Cambridge. De Kleer, J. and Brown, J.S. (1984), “Qualitative physics based on confluence”, Artificial Intelligence, Vol. 24, pp. 7-83. Dung, P.M. (1995), “On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games”, Artificial Intelligence, Vol. 77, pp. 321-57.

Haenni, R. (1998), “Modeling Uncertainty in propositional assumption-based systems”, in Parson, S. and Hunters, A. (Eds), Applications of Uncertainty Formalisms, Springer, Berlin, pp. 446-70. Lucas, P.J. (2001), “Certainty factors like structure in Bayesian belief network”, Knowledge-Based Systems, Vol. 14, pp. 327-35. Pearl, J. (1988), Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, Los Altos, CA. Simon, H.A. (1954), “Spurious correlation”, Journal of The American Statistical Association, Vol. 49 No. 267, pp. 467-79. Simon, H.A. (1969), “The architecture of complexity”, The Sciences of the Artificial, MIT Press, pp. 192-229. Skyrms, B. and Harper, W.L. (Eds) (1988), Causation, Chance and Credence, Vol. 1, Kluwer Academic Publishers, Dordrecht. Spirtes, P., Glymour, C. and Scheines, R. (2000), Causation, Prediction, and Search, 2nd ed., MIT Press. Srinivas, S. (Ed.) (1993), “A generalization of the noisy-OR model”, Proceedings of Uncertainty in Artificial Intelligence, Morgan Kaufmann, San Mateo, CA, pp. 208-15. Suppes, P. (1970), Probabilistic Theory of Causality, North Holland Publishing Company, Amsterdam. Wellman, M.P. (1988), “Qualitative probabilistic networks for planning under uncertainty”, Uncertainty in Artificial Intelligence, Vol. 2, pp. 197-208. Zhang, W. (2000), “Xinyang bipolar set and bipolar relations”, Proceedings of International Conference of Artificial Intelligence, Las Vegas, June. Zhang, W., Chen, S. and Bezdek, J.C. (1989), “POOL2 – a generic system for cognitive map development and decision analysis”, IEEE Trans. on Sys. Man. and Cybern., Vol. 19 No. 1, pp. 31-9.

Bipolar logic and probabilistic interpretation 1383

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 34,9/10

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Cooperative clans Nathan Griffiths Department of Computer Science, University of Warwick, Coventry, UK

1384 Abstract Purpose – To provide a mechanism for agents to form, maintain, and reason within medium-term coalitions, called clans, based upon the notions of trust and motivation. Design/methodology/approach – The model is based upon the notions of trust (representing an agent’s assessment of another’s honesty and reliability) and motivations (which represent an agent’s high-level goals). The paper describes the motivational factors that can lead to clan formation, a mechanism for agents to form a clan or join an existing clan, and subsequently how clan membership influences behaviour (in particular though sharing information and acting on behalf of other members). Finally, describes the conditions under which agents leave a clan. Findings – The proposed mechanism shows how agents can form medium-term clans with trusted agents based on motivations that are essentially self-interested. It is shown how this mechanism can be used to reduce missed opportunities for cooperation, improve scalability, reduce the failure rate and allow sharing of trust information (i.e. establish a notion of reputation). Originality/value – Proposes a new approach to coalition formation based on the notions of trust and motivation, which allows self-interested agents to form medium-term coalitions (called clans) to increase their own (motivational) returns. Keywords Modelling, Cybernetics, Control systems, Intelligent agents, Trust, Motivation (psychology) Paper type Research paper

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1384-1403 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614722

1. Introduction Agents in a multi-agent system typically must cooperate to achieve their goals, due to differences in their capabilities, knowledge and resources. In general, agents are not benevolent and to cooperate they must receive some individual benefit. Previous work has utilised the notions of motivation and trust to provide a framework for cooperation that accounts for the individual benefit received from cooperating with others (Griffiths, 2000). Motivation and trust are the fundamental components on which cooperation is built. Motivations represent an agent’s high-level desires and determine the desirability of cooperating with respect to a particular situation. Trust embodies an agent’s assessment of the risk involved in cooperating with another, and enables the uncertainty involved in cooperating to be managed. In this paper we extend the notion of using motivations and trust to achieve cooperation, by providing a mechanism for agents to form medium-term coalitions, called clans, to enhance their future interactions. Clan formation was previously described in Griffiths (2003). In this paper, we describe the process of clan formation in more detail, and describe how agents reason in, and maintain, clans. Previous approaches to cooperation can be broadly divided into two groups: task-based and coalition-based. Task-based approaches, such as Tambe (1997), are concerned with attaining short-term cooperation to achieve specific tasks. Unless agents have common goals, or similar motivations, at the time of establishing cooperation they will not cooperate. If agents’ goals are similar in the long-term (but out of step in terms of time), they may have benefited overall from cooperation even if there

was no immediate benefit. In the long-term, task-based approaches can cause opportunities for beneficial cooperation to be missed. Furthermore, task-based approaches tend to require a group of agents to be re-established for subsequent tasks, even if the tasks and group members are similar. A long-term approach, where cooperative groups persist and are re-used where appropriate, can significantly reduce the computation required to achieve cooperation. Thus, we must take a long-term view to avoid missed opportunities and reduce the computation in establishing cooperation. Coalition-based approaches take a long-term view, although often directed toward a specific goal, where the benefit of joining a coalition tends to be assessed according to the utility gained by the group if a coalition is formed (Breban and Vassileva, 2001; Klusch and Shehory, 1996; Shehory and Kraus, 1995). Calculation of such utility often requires agents to reveal their individual utilities, and does not account for any motivational reasons an agent might have. Motivations represent agents’ high-level desires, and motivational value cannot be compared across agents since these desires differ. Consequently, utility-based coalition formation approaches cannot be directly applied to motivated agents, and a motivation-based approach is required. Existing approaches are also limited in terms of their scalability. In particular, all known agents must typically be considered when establishing cooperation. As the number of agents increases, the search space and communication cost also increases. Congregations aim to reduce this cost, such that rather than searching the whole population, agents congregate into interest groups and search within the congregation (Brooks and Durfee, 2002; 2003). Since a single goal connecting members of a congregation is not required, some of the limitations of task-based approaches are avoided. Agents are divided into labellers and congregators and the former label their congregations so as to attract similar congregator agents. Clans take the essence of congregations and allow agents to consider the long-term benefits of cooperation; clans enable self-organisation of the space of agents to increase scalability (Griffiths and Luck, 2003). Clans are loosely coupled composite entities, and are similar to congregations in representing groups of agents. The key distinction is that similarity is defined for clans in motivational terms, and the notion of trust binds the group together. In forming clans agents are not explicitly divided into labellers and congregators, instead these roles are implicitly incorporated into the cooperative process. In particular, when an agent wishes to form a clan, or to increase the membership of an existing clan, it can act analogously to a labeller by requesting others join the clan. Other agents act similarly to congregators by evaluating the initiator’s stated interests to assess whether to join the clan. Numerous factors are involved in governing how an individual cooperates. In this paper, we indicate the most significant of these factors, namely trust, motivation and reputation, and use them to provide a flexible framework for cooperation. The relationship between these factors is summarised in Figure 1. An agent’s trust models,

Cooperative clans

1385

Figure 1. Overview of the factors in cooperation and clan formation

K 34,9/10

1386

i.e. the level of trust it places in others, are determined by its own individual experience and disposition. Reputation is determined both by individual trust, and information given by others about their own experiences (and the trustworthiness of those providing information). An agent’s decision to cooperate and its subsequent commitment is a function of its motivations, trust of others, and the reputation it perceives potential cooperative partners to have. The remainder of this paper is structured as follows. In Section 2, we introduce the nature of the agents that we are concerned with. Section 3 describes the set of criteria that agents use to determine when to form a clan. Sections 4 and 5 describe how clans are formed, and how they influence cooperation, respectively. The conditions under which agents should leave a clan are discussed in Section 6. In Section 7, we describe how agents can join existing clans. Finally, in Section 8 we identify areas of future work, and conclude the paper. 2. Cooperative agents We adopt a BDI-based approach and take an agent to comprise: beliefs about itself, others and the environment; a set of desires (or goals) representing the states it wants to achieve; and intentions corresponding to plans adopted in pursuit of these desires (Bratman et al., 1988). In addition to the traditional BDI model, however, we concur with the views of some that motivation is an extra component required for autonomy (Castelfranchi, 1995; Luck and d’Inverno, 1995; Norman, 1996) and we refer to the resulting architecture as motivated BDI, or mBDI. In accordance with the standard BDI model, agents also have a library of partial plans from which to select the most appropriate to achieve their goals. Actions within a partial plan can be individual, joint or concurrent. Individual actions are performed by a single agent alone, joint actions require simultaneous contributions from two or more agents, and concurrent actions comprise a set of individual or joint actions to be performed concurrently with synchronisation at the start and end of a concurrent block. Joint and concurrent actions correspond to the notions of strong and weak parallelism introduced by Kinny et al. (1992). Partial plans can also contain subgoals, for which subplans must be selected at execution time. Motivations facilitate autonomy, and are high-level desires characterising an agent, guiding behaviour and controlling reasoning; they cause the generation and subsequent adoption of goals, and guide reasoning and action at both individual and cooperative levels. Differences between agents are characterised by their motivations, which can lead to both differences in goals, and in social behaviour. An agent has a fixed set of motivations, each having an intensity that varies according to the current situation. For example, suppose an agent’s motivations include “hunger” and “survival”. If the agent’s energy is low then the intensity of the “hunger” motivation will be high, causing the generation of a goal to eat food. However, while the intensity of a given motivation fluctuates, motivations themselves are not transient and the set of motivations belonging to a particular agent does not change. Thus, although the agent’s “survival” motivation may have a low intensity and not be contributing to the current behaviour, the motivation is always present and may become active in certain situations. Motivations provide an agent with autonomy and allow it to generate goals in response to changes in its environment and select appropriate actions to perform. Furthermore, as we discuss in the remainder of this

paper, motivations guide decisions with respect to cooperation. As with individual actions, cooperative activity must be motivated. A single motivation is represented by a tuple (m, i, l, fi, fg, fm), where m is the name of the motivation, i is its current intensity, l is a threshold, fi is the intensity update function, fg the goal generation function, and fm the mitigation function. As in the standard BDI approach, mBDI agents perceive their environment using their sensors and update their beliefs according to these perceptions. The intensities of an mBDI agent’s motivations change in accordance with its beliefs (through the application of fi). Thus, perceptions determine both the beliefs that agents hold, and the intensities of their motivations. Motivations provide a way for agents to have goals appropriate to the situation. If the current situation causes the intensity of a motivation to exceed its threshold, l, then a set of goals is generated using the function fg. These goals are evaluated according to their motivational value (i.e. the amount by which their achievement would reduce the motivational intensity, as determined by fm), and the most important are adopted as intentions by selecting an appropriate plan and committing to its execution. The agent then selects an intention to pursue and acts toward its achievement, again using motivational value to guide its choice. This mechanism is embodied by the mBDI reasoning cycle. (1) Perceive the environment and update beliefs. (2) For each motivation apply fi to update its intensity based on the current perceptions. (3) For each motivation whose intensity i is greater than the threshold l apply fg to generate a set of new goals. (4) Select an appropriate plan for the most motivationally valuable of these generated goals, and adopt it as an intention. (5) Select the most motivationally valuable intention and act toward it by performing the next step in the plan. (6) On completion of an intention, apply the mitigation function fm to each motivation to reduce its intensity according to the motivational value of achieving the goal. (7) Finally, return to the beginning of the cycle.

Cooperative clans

1387

We represent a complete mBDI agent as a tuple (M, B, D, I, PL) where M signifies the agent’s set of motivations, B represents its beliefs, D corresponds to the agent’s desires (or goals) as generated from its motivations, I are the intentions that it is committed to, and PL is the plan library. The resulting architecture is illustrated in Figure 2, in which solid arrows represent the flow of information, and dotted arrows the control structure.

Figure 2. The mBDI agent architecture

K 34,9/10

1388

2.1 Trust models The notion of trust is widely recognised as a means of assessing the perceived risk in interactions arising from uncertainty about how others will behave (Castelfranchi and Falcone, 1998; Marsh, 1994a). Trust represents an agent’s subjective estimate of how likely another agent is to fulfil its cooperative commitments. We base our model of trust upon the formalism proposed by Marsh (1994a) and the work of Gambetta (1988), and define the trust Ta in an agent a to be a value from the interval between 0 and 1. Values approaching 0 represent complete distrust and those approaching 1 represent blind trust. Trust values are determined by an agent’s previous experience, and are updated after each interaction. The numbers themselves primarily represent comparative values internal to an agent’s individual representation, and are meaningful only in the context of the agent’s experience. These values represent the degree of trust subjectively ascribed to another based on individual experience and disposition. Different agents may ascribe different trust values to a third party based on their own individual interactions. Trust values are not directly numerically comparable across agents because they are subjective and based on individual experiences and dispositions. A specific trust value has a different meaning for different agents. Trust is an individual assessment of another based on experience, and should not be confused with the notion of reputation, which represents an assessment of another based on both individual experience and information obtained from others. We discuss the notion of reputation in Section 5. Trust values are associated with a measure of confidence, and as experience is gained confidence increases. However, it is important to consider the recency of experience. In particular, we assume that trust decays over time, and that given a sufficient period of time an agent’s trust of another will tend toward the default value. This means that the positive effect of successful interactions on trust will reduce over time, as will the negative effect of unsuccessful interactions. The rate at which trust decays is individual to a particular agent, and is a function of that agent’s memory length. An agent has a trust model of each other agent with whom it has previously interacted or has acquired knowledge. If there have been no previous interactions and there is no acquired (or pre-built) knowledge then there is no corresponding agent model and the default trust value Tinitial is used when assessing its trustworthiness. Otherwise, for each other agent an individual will have a trust model representing the capabilities that it is believed to possess, and the trust ascribed to it. We represent a trust model as a tuple (id, C, t) where id is the identifier of the agent being modelled, C is a set corresponding to its believed capabilities, and t is the ascribed trust. 2.2 Inferring trust Trust values are inferred according to an agent’s disposition: optimists infer high values, while pessimists infer low values (Marsh, 1994b). After a successful interaction, optimists increase their trust more than pessimists, and conversely, after unsuccessful interactions pessimists decrease their trust more than optimists. The magnitude of change in trust is a function of several factors depending on the agent concerned, including the current trust and the agent’s disposition. The range of this disposition is a continuum between blind optimism and blind pessimism, where a blind optimist only

ever increases its trust of others, and a blind pessimist only ever decreases its trust. At the extremes of this continuum trust ceases to be a useful concept, since eventually blind optimists will place complete trust in all agents and blind pessimists will have complete distrust of all others. The trust disposition of an agent is described by three characteristics: (1) the initial trust it ascribes given a lack of other information, Tinitial, (2) the function used to update trust after a successful interaction, updatesuccess, and (3) the function used to update trust after an unsuccessful interaction, updatefail.

Cooperative clans

1389

The functions for updating trust are simple heuristics, and there is no standard definition, rather, it is the responsibility of the system designer to choose appropriate heuristics. We take a simple approach by defining the update function for a successful interaction as updatesuccess ðTÞ ¼ T þ ðð1 2 TÞ £ d success Þ and the update function for an unsuccessful interaction as updatefail ðTÞ ¼ T £ d fail where dsuccess and dfail represent the agent’s disposition for increasing and decreasing trust, respectively, such that dsuccess, dfail [ ½0; 1: 3. Clans Motivation and trust are the fundamental components that lead to cooperation: motivations give rise to the wish to cooperate, and trust guides the decision about who to cooperate with. Cooperation is more than simultaneous actions and individual intentions; agents must be committed to the activity of cooperation itself (Bratman, 1992; Levesque et al., 1990) and must abide by an appropriate set of conventions (Wooldridge and Jennings, 1999) specifying when and how cooperation can be abandoned. Where such commitments and conventions are adopted, we say that agents have formed a cooperative intention. Three basic stages (Figure 3) are involved in the formation and execution of a cooperative intention: plan selection, intention adoption, and group action, as follows. (1) Plan selection: Motivations cause the generation of goals, which must be adopted as intentions by selecting plans and committing to their execution. When selecting plans agents consider both the likelihood of finding other agents to assist and their trustworthiness. By combining plan cost with an estimate of the risk associated with the potential cooperative partners, agents can balance their desire to minimise cost and risk (Griffiths and Luck, 1999). (2) Intention adoption: If the selected plan requires cooperation then the agent must solicit assistance. The initiating agent annotates each contribution in the plan

Figure 3. Stages of cooperation

K 34,9/10

1390

with the identifier of the agent considered best able to perform it, based on their believed capabilities and trustworthiness, and requests their assistance (Griffiths et al., 2003). Requests are evaluated, and responses sent, according to agents’ motivations and intentions. A cooperative intention is formed if sufficient agents agree to assist. (3) Group action: Once a cooperative intention has been formed the plan is executed. On successful completion, commitments are dissolved and cooperation is finished. If execution fails, the agent that detects the failure informs the others in accordance with the conventions, and again commitments are dissolved. In both cases, agents update the trust values ascribed to others involved in cooperation according to the trust update functions described above. Since agents cooperate in pursuit of a specific goal this is a short-term approach to cooperation, and there are four primary problems that arise. First, since agents are autonomous and driven by their individual motivations, missed opportunities for cooperation can occur. Secondly, where the environment contains a large number of agents, the overhead of establishing and maintaining cooperation can lead to scalability problems. Thirdly, the approach to cooperation described above assumes that agents have sufficient information about others’ capabilities (and trustworthiness) to establish cooperation. Finally, in a dynamic environment the intensities of motivations can fluctuate, which can give a lack of robustness to cooperation. Clans provide a means to address these problems, along with some of the limitations of existing approaches to coalition formation. In particular existing approaches to coalition formation tend to focus on specific tasks rather than taking a long-term view, and do not consider the trustworthiness of participants (Shehory and Kraus, 1998; Tambe, 1997). The congregations model avoids taking a short-term view, but again does not consider the trustworthiness of the agents involved (Brooks et al., 2000; Brooks and Durfee, 2003). We view clan formation as a self-interested activity – an agent attempts to form a clan for its own benefit, and not in response to any external influence. Clan formation is driven by an agent’s motivations and guided by its trust of others. To determine when to form a clan, an agent must assess the extent to which the issues identified above are affecting its performance. The following gives a skeletal algorithm outlining the decision process: function ASSESS -WHEN -TO -FORM -CLAN returns boolean local: missed-opportunities ˆ false scalability ˆ false lack-of-information ˆ false high-failure-rate ˆ false if ((request-failure-rate . request-failure-threshold ) and (MOTIVATIONAL -VALUE (FILTER ( previous-rejected-requests)) . rejectionthreshold )) then missed-opportunities ˆ true if (PROPORTION -COOPERATIVE (recently-applicable-plans) . scalability-threshold )

then scalability ˆ true if ((AVERAGE -TRUST (agent-models) , trusted-threshold ) or (AVERAGE -CONFIDENCE (agent-models) , confidence-threshold ) and (exists agents such-that (AVERAGE -TRUSTWORTHINESS (agents) . trusted-threshold ) and (AVERAGE -CONFIDENCE (agents) . confidence-threshold ))) then lack-of-information ˆ true if ( failure-rate . failure-threshold ) then high-failure-rate ˆ true if (missed-opportunities or scalability or lack-of-information or high-failure-rate) then return true else return false end In the remainder of this section we describe its component steps. 3.1 Missed opportunities Motivations guide all aspects of an agent’s behaviour, including its response to requests for cooperation. As described in the previous section, the intensities of motivations fluctuate in response to changes in the environment, and it is the current intensities that determine whether an agent desires to cooperate. A response to a request is determined by a combination of this desire to cooperate, whether it can cooperate (determined by its capabilities and intentions), and whether the risk from cooperation is acceptable (determined by trust). Where the intensities of agents’ motivations are out of step in time, missed opportunities for cooperation can occur, since agents’ desires to cooperate may also be out of step. These short-term fluctuations can lead to failures to establish cooperation that would be beneficial long-term. Therefore, an agent needs some means of assessing the extent to which it is missing such opportunities. Since motivations are private and internal to individuals, an agent cannot inspect others’ motivations. Instead it must consider the requests for assistance received from others that it has declined. If few requests have been declined then there are, at most, few missed opportunities (from this agent’s perspective). Alternatively, if there are many declined requests then there may be many missed opportunities; each missed opportunity leads to a declined request, although clearly requests may be declined for other reasons. When an agent experiences a high failure rate in establishing cooperation (above the request-failure-threshold ) due to other agents declining its requests, it should inspect all previous incoming requests within its memory limit. These previously rejected requests are filtered so that only those that are similar to the current plan remain. If the current motivational value of previously these filtered requests exceeds the rejection-threshold, then we take the agent to be at risk of missed opportunities. This heuristic represents a simple approach for assessing missed opportunities and, as with other aspects of this framework, more sophisticated approaches are possible. For example, to reduce the likelihood of detecting simple one-off failures, an agent might consider the extent of fluctuations in motivation intensities, or the motivational value of the request over previous iterations. The best heuristic to use is dependent on

Cooperative clans

1391

K 34,9/10

1392

the characteristics of the domain in which agents are situated, and this can only be verified empirically. 3.2 Scalability To initiate cooperation, an agent typically must consider the suitability and trustworthiness of all other agents that it knows about. There is a cost to identifying and communicating with these agents, and the process of finding cooperative partners reduces the time spent in pursuit of goals. Furthermore, no direct motivational benefit is gained from identifying and communicating with others, only from the cooperative process itself. Thus, not only is there a computational and time cost in establishing cooperation, but the time the agent can spend in pursuit of its motivations is reduced. The number of agents modelled by an individual and the frequency of cooperation gives an indication of the scale of the problem. If cooperation is rare or there are few other agents modelled, then the impact is much less than if each plan requires cooperation and there are many agents to be considered. The proportion of an agent’s plans that are cooperative influences the frequency with which it cooperates. However, since agents may not utilise all of their plans, we filter out those that are unlikely to be relevant. In particular, we can measure the proportion of applicable plans that are cooperative in the last n reasoning cycles, where n is the agent’s memory length. If the proportion that are cooperative exceeds the scalability-threshold then the agent should attempt to form a clan. 3.3 Lack of information For an agent to successfully establish cooperation, it must know of trusted agents that have suitable capabilities. If there is insufficient knowledge of others’ trustworthiness or capabilities then it may not be possible to establish cooperation. Recall from Section 2 that agents maintain a measure of confidence in their trust models, depending on the extent of the experiences that have formed them. If an agent does not have sufficient confidence in its models then clan membership may be beneficial, since clan members share information. Furthermore, if many agents are distrusted, then again clan membership may be beneficial. However, there is a lower bound on confidence and trust below which it is not appropriate form a clan, due to a lack of confidence and trust in the potential members. In particular, it is only practical to form (or join) a clan with agents who are trusted to a reasonable degree of confidence. Therefore, an agent should inspect its models of others and if there are many untrusted agents (the average trust is below the trust-threshold ) or agents whose models have low confidence (the average confidence is below the confidence-threshold ), it should attempt to form a clan, provided there is a subset of confidently trusted agents with whom to do so. 3.4 High failure rate With changes in the environment, motivation intensities fluctuate and can lead to failure in cooperation, since agents’ commitments may change. Clan membership strengthens commitments to cooperation and may help reduce high failure rates. As we describe in Section 5, membership of a clan provides a mechanism for an agent to obtain motivational value through acting in what may otherwise appear to be a semi-benevolent manner, i.e. clans provide a means for an agent to gain individual benefit from assisting others. Thus, if an agent is experiencing execution failures above the failure-threshold, then it should attempt to form a clan.

4. Forming a clan Based on its assessment of missed opportunities, scalability, lack of information, and failure rate, an agent determines whether it should form a clan. If clan formation is considered necessary then it should try to form a clan with an appropriate set of agents, namely, those who are trusted and likely to be relevant to any future cooperative activity. Trust determines whether it is practical to form a clan, since if an agent has a low trust of others or low confidence in its trust models, then it should not form a clan. If, however, it has adequate trust in others (above the trust-threshold with confidence greater than the confidence-threshold ), then it can attempt to form a clan. In order to target its requests toward appropriate agents it should estimate its future goals and attempt to form a clan with those agents whose assistance is likely to be required. In a dynamic environment this cannot be assessed directly. The set of active motivations, however, tends to be relatively static in the medium-term, and can be used to identify future goals that might be generated. The set of actions for which assistance may be required, is obtained from these goals by considering the possible plans for them. Based on these actions the most trusted agents who are believed to have suitable capabilities are selected. Ideally, all agents whose assistance is requested would be clan members, since this increases the likelihood of them agreeing to cooperate and keeping their commitments. However, large clan sizes have a disadvantage in terms of computational overhead and because there are more agents to whom assistance is inclined to be offered (at a potential cost of restricting an agent’s other activities). Consequently, there is a preferred clan size which balances the conflicting desires for all future requested agents to be clan members, against the computational cost and the restrictiveness of acting on behalf of other clans members. We take a simple approach to assessing this preferred clan size, based on the current situation. In particular, we consider the plans that are likely to be adopted in the future (as described above) and extract the average number of actions for which assistance is required, using this to estimate the preferred size. As not all agents to whom requests are sent will join the clan, we add a degree of redundancy to this preferred size. Based on this preferred clan size, the set of most trusted agents who are believed to have relevant capabilities are sent a request to form a clan. These agents must then determine whether to accept the request. Typically, although clan membership may be beneficial, an agent’s assessment of whether to form a clan (based on the factors described in Section 3) may not indicate this, i.e. although the clan formation factors may be relevant, they may not be sufficient for the agent to accept the request. The requesting agent must, therefore, give some additional incentive for joining the clan. Since we do not assume that agents have negotiation or persuasion capabilities abilities, we take a simple mechanistic approach. Specifically, the request to join a clan should include an indication of the expected future activities of the clan. This is determined by considering the most active motivations, and extracting the most frequently generated goals. This set of goals represents the “general purpose” of the clan and corresponds to the activities clan members are likely to be asked to perform by the initiating agent. Although this involves revealing essentially private information, we argue that the motivational benefit that the agent will (hopefully) gain from forming the clan, justifies giving this information.

Cooperative clans

1393

K 34,9/10

1394

If sufficient agents agree to form a clan (i.e. more than the minimum clan size) then the initiator sends acknowledgements and a clan is formed with those who accepted. Alternatively, if insufficient agents accept, then those that did accept are informed of the failure to obtain sufficient positive responses and clan formation abandoned. The following outlines the clan formation process: function FORM- CLAN returns boolean input: redundancy, timeout local: current-plans ˆ { } min-size ˆ 0 preferred-size ˆ 0 target-agents ˆ SELECT- MOST- TRUSTED (agent-models, confidence-threshold ) current-plans ˆ S ELECT -P LANS (EXTRACT -G OALS (active-motivations), plan-library) for agent in target-agents do if ((BELIEVED -CAPABLE (agent, EXTRACT -ACTIONS (current-plans)) ¼ false) or (TRUST (agent, agent-models) , trust-threshold )) then target-agents ˆ target-agents - agent end min-size ˆ #(EXTRACT -ACTIONS (current-plans))/#(current-plans) preferred-size ˆ min-size £ redundancy goals-to-communicate ˆ EXTRACT -GOALS (active-motivations) if (#(target-agents) . preferred-size) then target-agents ˆ HEAD (target-agents, preferred-size) for agent in target-agents do REQUEST -FORM -CLAN (goals-to-communicate) end responses ˆ GET -RESPONSES (timeout) if (#(responses) . min-size) then for agent in ACCEPT (responses) do CONFIRM (agent) end return true else for agent in ACCEPT (responses) do DECLINE (agent) end return false end The first part of the algorithm is concerned with determining who to invite to join a clan, and requesting that they join. The latter part of the algorithm shows how the responses are processed. To determine whether to agree to join a clan those agents that receive requests must consider both the trustworthiness the requesting agent and the motivational value of joining. If the trust of the requesting agent is below the minimum trust threshold, or is not confidently trusted, then the request is simply declined. If the requesting agent is sufficiently trusted then the criteria described in Section 3 are considered to give an indication of how beneficial clan membership would be in general. If this general assessment indicates that the agent desires to form a clan, then the request is accepted.

Otherwise, the goals contained in the request are used to estimate how useful it would be to join the clan in particular. The motivational value of each goal is considered in a situation independent manner, i.e. the general motivational value is considered without reference to the current motivation intensities. If this value exceeds a threshold then the agent agrees to form a clan. The outline of the process of assessing requests for clan formation is: function PROCESS -FORMATION -REQUEST returns response input: requester, request-goals local: motivational-value ˆ 0 if (TRUST (agent-models, requester) , trust-threshold ) then return decline if (ATTEMPT -TO -FORM -CLAN ) then return accept for goal in request-goals do motivational-value ˆ motivational-value þ MOTIVATIONAL -VALUE (goal) end if (motivational-value . threshold ) then return accept else return decline end 5. Reasoning in a clan Clan membership influences three main aspects of behaviour: information sharing, commitment to cooperation, and scalability. In the first case, a clan member that requires information on others capabilities or trustworthiness, can request information from other clan members. In the second case, clan members are more likely both to cooperate and to fulfil their commitments, due to increased motivational value of cooperation. In order to ascribe motivational value to clan membership, and to ensure that agents remain self-interested, we introduce a kinship motivation to all agents. Kinship is mitigated by offering assistance to other clan members, and its intensity is determined by the proportion of goals that require cooperation, and the extent and quality of an agent’s trust models. In the final case, scalability is increased by reducing the search cost of finding cooperative partners by simply searching through the members of the clan. 5.1 Sharing information: reputation When an agent requires information about the capabilities and trustworthiness of others, it can ask other clan members. In particular, when faced with a plan containing actions for which no confidently trusted and capable agents are known, it can ask trusted members of its clan. Note that although clan members will have been trusted at the time of clan formation, some may have come to be distrusted over time, but not so much as to justify leaving the clan. Therefore, the trustworthiness of clan members must still be checked when interacting with them. Clan members gain motivational value, via the kinship motivation, from sharing information about other agents’ capabilities and trustworthiness. The motivational value received from giving such information is determined by the intensity of the kinship motivation. If the intensity is above its associated threshold, then an agent should share information, otherwise insufficient benefit is received to justify offering information. Additionally, information should only be shared with trusted agents, and before giving

Cooperative clans

1395

K 34,9/10

1396

information an agent should check that the requester is trusted. This mechanism allows agents to discover other trusted agents outside the clan to assist them. The process of requesting information from other clans members is outlined thus: function REQUEST -INFORMATION input: plan, timeout local: problem-actions ˆ { } trusted-agents ˆ { } responses ˆ { } for action in plan do trusted-agents ˆ { } for agent in CAPABLE (agent-models, action) if (TRUSTED (agent, trust-threshold )) then trusted-agents ˆ agent if (trusted-agents ¼ { }) then problem-actions ˆ problem-actions < action end if ( problem-actions – { }) then target-agents ˆ S ELECT -TRUSTED (C LAN -M EMBERS (agent-models), confidence-threshold ) for agent in target-agents do REQUEST -INFORMATION ( problem-actions) end responses ˆ GET -RESPONSES (timeout) for action in problem-actions do agent-models ¼ agent-models < REPUTATION (FILTER -CAPABLE (responses, action)) end end Subjectivity is the primary problem in sharing trust information, since trust values are internal to agents and depend on disposition and experience; they are not directly comparable numerically across agents. Some researchers take the approach of eliminating small subjective differences between agents by using a stratification of trust, dividing the numerical range into subranges (Marsh, 1994a; Abdul-Rahman and Hailes, 2000). Stratification removes subjective differences between agents provided those differences are within the same subrange. However, if trust values differ across subranges then stratification is counter-productive and accentuates differences. Furthermore, stratification of the numerical range leads to a loss of sensitivity and accuracy; it becomes impossible to distinguish between values that are in the same subrange. Stratification only addresses subjectivity if the differences in trust values between agents are small. Agents’ dispositions and experiences must be such that if two agents ascribe a trust value in the “highly-trusted” subrange, they infer the same meaning for this value. However, as discussed in Section 2, a consequence of agents having individual dispositions is that, in general, two different agents will not infer the same meaning from a given numerical value. In our view, the loss of sensitivity and accuracy resulting from stratification, coupled with its relatively limited applicability,

mean that its use is not appropriate. We take a more straightforward approach in which agents simply communicate numerical values, knowing that these values are not directly comparable across agents. The following outlines the process of sharing information with another clan member: function PROVIDE -INFORMATION input: problem-actions, requester local: response ˆ { } agent ˆ null if ((INTENSITY (kinship) . THRESHOLD (kinship)) and (TRUST (requester, agent-models) . trust-threshold ) and (requester [ CLAN -MEMBERS (agent-models))) then for action in problem-actions do agent ˆ SELECT -MOST -TRUSTED (agent-models, confidence-threshold, action) response ˆ response < (agent, TRUST (agent, agent-models), action) end SEND -RESPONSE (response) end When sharing trust information, we adopt two key constraints, as proposed by Marsh (1994a): if agent a1 obtains information about a3 from a2 then (1) a1 does not trust a3 more than a2 trusts a3, and (2) a1 does not trust a3 more than it trusts a2. Thus, any trust information obtained is moderated by the trust ascribed to the provider. Since the resulting information about a3 incorporates another agent’s subjective view, the result is an assessment of a3’s reputation. Recall that trust refers to an individual’s assessment of another, while reputation refers to an assessment based on others’ trust values, i.e. trust is an individual notion and reputation is a social notion. Since reputation also includes subjective elements in terms of the trust of the information providers, different agents within a clan are likely to arrive at different reputation assessments for a given agent, although in practice these differences are typically small. To determine the reputation of an agent, based on information provided by a set of clan members, we take the average value where each component is moderated by the trust ascribed to the provider. Thus, the reputation from the perspective of agent ax of agent ay, based on information provided by clan members a1 ; a2 ; . . . ; an is determined as follows n X

Rxy ¼

T ax ai £ T ai ay

i¼1

n

where Tij denotes the trust, ai ascribes to aj, and Rij denotes the reputation ai has determined for aj. The latter part of the requesting information indicates how an agent

Cooperative clans

1397

K 34,9/10

1398

determines the reputation of another, where the function REPUTATION is assumed to implement the above equation. It should be noted that other notions of reputation have been proposed elsewhere. However, our model differs from other approaches in that reputation is determined directly from individual trust and agents’ dispositions. The REGRET system, for example, considers reputation in an online marketplace scenario, where agents record “impressions” of others after interactions (Sabater and Sierra, 2001). Reputation is determined by combining impressions and individual experience. This approach is similar to ours in that reputation is a combination of an agent’s own experience, and that of others. However, there is no explicit representation of individual trust and, although related to trust, the impressions used in REGRET do not represent an individual’s assessment of risk. Rather, they are a subjective evaluation made by an agent on the outcome of an interaction. Mui et al. (2002, 2003) propose a mechanism for assessing trust and reputation by statistical estimates of cooperation in a Prisoner’s Dilemma interaction, where agents either cooperate or defect. Reputation measures the likelihood that an agent reciprocates another’s actions, and will cooperate in the Prisoner’s Dilemma game. Information is propagated via embedded social networks in which agents are assumed to reveal the trust and reputation information they ascribe to others. Our approach differs since reputation and trust are subjective estimates based on experience, not on probabilities. Furthermore, we do not assume that the propagation of information is automatic, since we require there to be motivational justification for information sharing. 5.2 Cooperation through kinship The kinship motivation serves to increase the likelihood of clan members cooperating, and of fulfilling cooperative commitments, by providing motivational value from cooperation. Kinship functions like any other motivation in guiding behaviour; its influence is taken into account when deciding whether to cooperate, and in determining when to rescind commitments. Thus, no additions are required to the agent reasoning cycle (as described in the mBDI reasoning cycle above) to incorporate this inclination to assist clan members. At a philosophical level, the kinship motivation can perhaps be seen to undermine the self-interested nature of agents. However, recall that agents choose to join a clan for specific reasons that are undeniably self-interested. Furthermore, kinship is just one of a set of motivations, and does not override the others; if it did then the agent would certainly cease to be self-interested. Decisions about cooperation continue to be driven by all of an agent’s motivations, and kinship is just one factor that contributes to a decision. 5.3 Improving scalability In general, when determining agents to ask for assistance all other known agents are considered. However, as described in Section 3 this can lead to scalability problems in systems that contain large numbers of agents. Where an agent joins a clan to address scalability problems, i.e. to reduce the search cost of finding cooperative partners, then it can simply search through the members of the clan rather than considering all known agents. However, the cost of this is that the agent will overlook the most appropriate agents if they are not in the clan, even if they are trusted and are known to

have appropriate capabilities. Given this disadvantage, agents should only restrict their search to clan members when it is necessary to do so. In particular, if the scalability criterion for clan formation, described in Section 3, is applicable then the agent should initially only consider other clan members. When faced with a plan that requires cooperation the agent should use the standard process of attempting to form a cooperative intention but be restricted to considering agent models corresponding to clan members, meaning that only clan members are asked for assistance. If this fails, then the standard cooperative intention formation procedure is undertaken, where all known agents are considered. 6. Leaving a clan Clan membership has a cost, in terms of computational overhead and because kinship may lead an agent to assist another clan member, rather than act as it would otherwise. It is not possible to directly assess the costs and benefits of clan membership, since there is no way to interrogate what others would do without kinship motivations. If there are many goals achieved through cooperation and/or there is a high cooperation rate then clan membership is likely to be worthwhile. Since others’ motivations cannot be inspected it is not possible for an agent to assess whether it is getting something in return for its clan membership, i.e. the extent to which others’ kinship motivations are affecting their behaviour. From the agent’s viewpoint, however, this does not matter – provided that it is successful in gaining cooperation then clan membership is considered beneficial. (Note that even from an external viewpoint there are many subtle benefits to clan membership that are difficult to assess, such as becoming more trusted by potential partners due to being seemingly “exploited”.) Provided that the clan is operating effectively there will be sufficient reciprocal action for agents to receive net motivational benefit overall. Indeed, this is one of the reasons for forming a clan: to address short-term fluctuations in motivations leading to missed cooperation opportunities. However, over time agents’ active motivations may change and the motivational benefit gained from membership of a specific clan will decrease, and eventually agents may receive insufficient benefit to justify continued membership. If an agent’s active motivations change such that it no longer receives sufficient benefit from the clan, then it should withdraw its membership by notifying the other members. Agents should also withdraw their membership if they come to distrust the other members. Since it is not possible to accurately consider whether the benefits of clan membership outweigh the costs, we take a simple approach to assessing whether to leave a clan by assessing its relevance and its influence on cooperative success. In particular, we consider the proportion of recently adopted plans for which cooperation was required, and if this proportion is below a minimum relevance-threshold then the clan is considered no longer relevant and agent should leave. To access the effect of clan membership on cooperative success we consider the proportion of successful and unsuccessful interactions that involved clan members. If the proportion of successful interactions that involved clan members is less than the success-threshold then the agent should leave the clan. Conversely, if the proportion of unsuccessful interactions involving clan members is greater than the failure-threshold then the agent should leave. This decision process is outlined thus:

Cooperative clans

1399

K 34,9/10

1400

function LEAVE -CLAN returns boolean input: recent-applicable-plans if (#(COOPERATIVE (recent-applicable-plans)) /#(recent-applicable-plans) , relevance-threshold ) then return true clan-interactions ˆ CLAN (recent-interactions) none-clan-interactions ˆ recent-interactions – clan-interactions if (#(SUCCESSFUL (clan-interactions)) /#(SUCCESSFUL (none-clan-interactions)) , success-threshold )) then return true if (#(UNSUCCESSFUL (clan-interactions)) /#(UN SUCCESSFUL (none-clan-interactions)) . failure-threshold )) then return true return false end Each individual makes its own decision about whether to stay in a clan or leave and there is no formal clan dissolution process. As the number of agents that remain in a clan decreases, the benefits obtained from clan membership to the members is also likely to decrease. Eventually, a clan will contain a single agent at which point the clan ceases to exist. 7. Joining existing clans Although we have described how agents can create clans, to be flexible agents must be able to join existing clans as well as creating new ones. The primary problem in enabling agents to join existing clans is providing a mechanism for agents to discover a suitable clan to join. In our scenario there is no centralised control or repository, and so a directory of existing clans is inappropriate. Indeed, if such a directory existed there would be no clear motivational value for agents to provide information about their clan membership to be interrogated by other, potentially distrusted, agents. Our approach is to provide two means for an agent to discover existing clans: by invitation from a clan member, or in response to a request for a member of an existing clan to join a new clan. The first case is a straightforward extension of the criteria for determining when to form a clan. Suppose an agent believes that it should form a clan (using the skeletal algorithm outlining the decision process), but on assessment of who to request discovers that many of the desired members of the new clan are already members of an existing clan. In this case, rather than forming a new clan, the agent can instead invite those agents who are not already members to join the existing clan. Since agents are self-interested, the inviting agent does not ask “permission” from the existing clan members, rather it simply informs them about any positive responses from newly invited agents and those agents update their knowledge of the clan accordingly (and continue to monitor the relevance and effectiveness of the new clan). Our second alternative occurs when an agent sends a request to form a clan to agents who are already members of an existing clan. In this case, each member that

receives a request can either respond in the standard manner, or can invite the proposed members of the new clan to join the existing clan. If the goals communicated by the requester are similar to the goals that caused the formation of the existing clan, then an invitation to join the existing clan is appropriate, provided that all of the agents concerned are suitably trusted. Such invitations to join an existing clan, are processed in the same manner as a standard clan formation request.

Cooperative clans

1401 8. Conclusion In this paper, we have described how clans can be used to address some of the limitations of existing approaches to cooperation. In particular, the problems of missed opportunities for cooperation, scalability, a lack of information, and high failure rates. We have described how agents can assess when to form a clan, how they should act within a clan, and the conditions under which they should leave. Clans are a mechanism for agents to improve their individual performance through cooperation without compromising their autonomy. A clan can be thought of as a loosely coupled entity, and a clans’ actions, and indeed its continued existence, depends solely on the self-interested decisions of its members. Any notion of collective intelligence is a transient quality dependent on the current state of the clan’s members. A clans’ capabilities and knowledge can be viewed as the union of its members’ capabilities and knowledge. However, there is no corresponding notion of a clan’s motivations, and members remain autonomous self-interested entities. The continued robustness and flexibility benefits that result from this individual autonomy are a key advantage of our approach. Our model of clans has been validated by an initial simulation of a distributed computing scenario comprising a set of agents, each with individually defined capabilities and motivations, situated in a dynamic and unpredictable environment. The capabilities define what an agent can achieve alone, and the motivations give rise to agents’ goals according to the current state of the environment. We undertook several simulations, varying the significance agents placed in their kinship motivations by changing the intensity and mitigation functions. In comparison with a control configuration where agents did not form clans, the introduction of clans significantly reduced the number of failed interactions (where agents rescinded commitments to cooperation due to changes in motivation intensities). As more importance was placed in the kinship motivation, less failures were experienced. In general, the number of successful interactions was increased with the introduction of clans. Owing to the computational overheads associated with clans, the benefits obtained with low kinship importance were negligible. An increase in kinship importance brought a corresponding increase in successful interactions. However, as kinship became more important, other motivations were overridden and, although there were increased successful interactions, agents tended to focus on assisting in achieving others goals rather than achieving their own goals. There are three key areas of ongoing work. First, we are investigating more sophisticated mechanisms for managing the membership of multiple clans. Currently, agents do not explicitly reason about multiple clans, and they manage multiple clan memberships implicitly by simply acting according to their motivations. Secondly, we are developing an ontology for sharing trust information. This can be seen as an alternative to the stratification approach

K 34,9/10

1402

that we rejected in Section 5, by allowing agents to agree on an ascribed meaning to the particular trust notions. For example, agents may agree that “highly-trusted” implies a certain degree of previous success given a particular degree of experience. This would allow us to have the benefits of stratification in terms of simplicity, while avoiding the associated problems. Finally, although we have undertaken limited experimentation of our approach, with favourable initial results, ongoing work involves performing more extensive evaluation. In particular, we intend to investigate the cost of clan membership on autonomy in terms of agents assisting others rather than achieving their own goals. References Abdul-Rahman, A. and Hailes, S. (2000), “Supporting trust in virtual communities”, Proceedings of the Hawaii International Conference on System Sciences, p. 33. Bratman, M.E. (1992), “Shared cooperative activity”, Philosophical Review, Vol. 101 No. 2, pp. 327-41. Bratman, M.E., Israel, D. and Pollack, M. (1988), “Plans and resource-bounded practical reasoning”, Computational Intelligence, Vol. 4, pp. 349-55. Breban, S. and Vassileva, J. (2001), “Long-term coalitions for the electronic marketplace”, in Spencer, B. (Ed.), Proceedings of the E-Commerce Applications Workshop, Canadian AI Conference. Brooks, C. and Durfee, E. (2002), “Congregating and market formation”, Proceedings of the First International Joint Conference on Autonomous Agents in Multi-Agent Systems (AAMAS-02), ACM Press, Bologna, pp. 96-103. Brooks, C. and Durfee, E. (2003), “Congregation formation in multiagent systems”, Journal of Autonomous Agents and Multi-Agent Systems, Vol. 7 Nos 1/2, pp. 145-70. Brooks, C., Durfee, E. and Armstrong, A. (2000), “An introduction to congregating in multiagent systems”, in Durfee, E. (Ed.), Proceedings of the Fourth International Conference on Multi-Agent Systems (ICMAS-2000), pp. 79-86. Castelfranchi, C. (1995), “Guarantees for autonomy in cognitive agent architecture”, in Wooldridge, M.J. and Jennings, N.R. (Eds), Intelligent Agents: Proceedings of the First International Workshop on Agent Theories, Architectures and Languages (ATAL-94), Springer, Berlin, pp. 56-70. Castelfranchi, C. and Falcone, R. (1998), “Principles of trust for MAS: cognitive anatomy, social importance, and quantification”, Proceedings of the Third International Conference on Multi-Agent Systems (ICMAS-98), Paris, pp. 72-9. Gambetta, D. (1988), “Can we trust trust?”, in Gambetta, D. (Ed.), Trust: Making and Breaking Cooperative Relations, Basil Blackwell, Oxford, pp. 213-37. Griffiths, N. (2000), “Motivated cooperation in autonomous agents”, PhD thesis, University of Warwick. Griffiths, N. (2003), “Supporting cooperation through clans”, Cybernetic Intelligence – Challenges and Advances, Proceedings IEEE Systems, Man and Cybernetics, 2nd UK&RI Chapter Conference. Griffiths, N. and Luck, M. (1999), “Cooperative plan selection through trust”, in Garijo, F.J. and Boman, M. (Eds), Multi-Agent System Engineering: Proceedings of the Ninth European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW’99), Springer, Berlin.

Griffiths, N. and Luck, M. (2003), “Coalition formation through motivation and trust”, Proceedings of the Second International Conference on Autonomous Agents and Multi-agent Systems (AAMAS-03), pp. 17-24. Griffiths, N., Luck, M. and d’Inverno, M. (2003), “Annotating cooperative plans with trusted agents”, in Falcone, R., Barber, S., Korba, L. and Singh, M. (Eds), Trust, Reputation, and Security: Theory and Practise, Springer, Berlin, pp. 87-107. Kinny, D., Ljungberg, M., Rao, A., Sonenberg, E., Tidhar, G. and Werner, E. (1992), “Planned team activity”, Proceedings of the Forth European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW-92), pp. 227-56. Klusch, M. and Shehory, O. (1996), “Coalition formation among rational information agents”, in Van de Velde, W. and Perram, J.W. (Eds), Agents Breaking Away: Proceedings of the Seventh European Workshop on Modelling Autonomous Agents in a Multi-Agent World (MAAMAW-96), pp. 204-17. Levesque, H.J., Cohen, P.R. and Nunes, J.H.T. (1990), “On acting together”, Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI-90), Boston, MA, pp. 94-9. Luck, M. and d’Inverno, M. (1995), “A formal framework for agency and autonomy”, Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), AAAI Press/The MIT Press, Menlo Park, CA/Cambridge, MA, pp. 254-60. Marsh, S. (1994a), “Formalising trust as a computational concept”, PhD thesis, University of Stirling. Marsh, S. (1994b), “Optimism and pessimism in trust”, Proceedings of the Ibero-American Conference on Artificial Intelligence (IBERAMIA-94). Mui, L., Mohtashemi, M. and Halberstadt, A. (2002), “A computational model of trust and reputation”, Proceedings of the 35th Hawaii International Conference on System Science. Mui, L., Mohtashemi, M. and Halberstadt, A. (2003), “Evaluating reputation in multi-agents systems”, in Falcone, R., Barber, S., Korba, L. and Singh, M. (Eds), Trust, Reputation, and Security: Theory and Practise, Springer, Berlin, pp. 87-107. Norman, T.J. (1996), “Motivation-based direction of planning attention in agents with goal autonomy”, PhD thesis, University of London. Sabater, J. and Sierra, C. (2001), “REGRET: A reputation model for gregarious societies”, paper presented at Fourth Workshop on Deception Fraud and Trust in Agent Societies, pp. 61-70. Shehory, O. and Kraus, S. (1995), “Task allocation via coalition formation among autonomous agents”, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (IJCAI-95), Montre´al, Que´bec, pp. 655-61. Shehory, O. and Kraus, S. (1998), “Methods for task allocation vai agent coalition formation”, Artificial Intelligence, Vol. 101, pp. 165-200. Tambe, M. (1997), “Towards flexible teamwork”, Journal of Artificial Intelligence Research, Vol. 7, pp. 83-124. Wooldridge, M. and Jennings, N.R. (1999), “Cooperative problem-solving”, Journal of Logic and Computation, Vol. 9 No. 4, pp. 563-92.

Cooperative clans

1403

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

K 34,9/10

Future reasoning machines: mind and body

1404

Media Lab Europe, Bellevue, Dublin, Ireland, and

Brian R. Duffy Gregory M.P. O’Hare, John F. Bradley, Alan N. Martin and Bianca Schoen Department of Computer Science, University College Dublin, Belfield, Dublin, Ireland Abstract Purpose – In investing energy in developing reasoning machines of the future, one must abstract away from the specific solutions to specific problems and ask what are the fundamental research questions that should be addressed. This paper aims to revisit some fundamental perspectives and promote new approaches to reasoning machines and their associated form and function. Design/methodology/approach – Core aspects are discussed, namely the one-mind-many-bodies metaphor as introduced in the agent Chameleon work. Within this metaphor the agent’s embodiment form may take many guises with the artificial mind or agent potentially exhibiting a nomadic existence opportunistically migrating between a myriad of instantiated embodiments. The paper animates these concepts with reference to two case studies. Findings – The two case studies illustrate how a machine can have fundamentally different capabilities than a human which allows us to exploit, rather than be constrained, by these important differences. Originality/value – Aids in understanding some of the fundamental research questions of reasoning machines that should be addressed. Keywords Cybernetics, Robotics, Philosophical concepts Paper type Conceptual paper

1. Introduction Intelligent systems research has undertaken an arduous and evolving path over many decades, all the while delivering, in approximately equal numbers, solutions to many problems whilst also identifying further as yet unsolved problems. Core principles from many disciplines have influenced our perspectives on a system’s function and form, none more so than the one-mind-one-body debate found in biological entities. The age-old notion of embodiment (the strong provision of context within the system), whether physical (Brooks, 1991; Steels, 2000) or social (Duffy, 2000; Duffy and Joue, 2001), has been an important development in artificial intelligence research and robotics. It has focused the control strategies employed on the robot’s environmental contexts. While fundamentally important and necessary, the continuing focus on the Kybernetes Vol. 34 No. 9/10, 2005 pp. 1404-1420 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614731

A sincere thank to John Bourke who ran the experiments presented in Bourke and Duffy (2003) and discussed in Section 5.1. The authors also gratefully acknowledge the financial support of the Higher Education Authority (HEA) Ireland and the Irish Research Council for Science, Engineering and Technology: funded by the National Development Plan.

narrow frame of reference of one-mind-one-body should be developed further and new paradigms investigated, which this work aims to address. Inevitably, our sources of inspiration come from what exists around us. Consequently significant research energies have been invested in such projects as trying to realise a human-like robot, a system that clearly encapsulates the one-mind-one-body concept. But to what extent should a machine’s reference be sourced from such biological references as ourselves? Is the human-based approach to a singular mind-body paradigm the only tangible option? Indeed how ought we to manage our perceptions and interpretations of artificial entities that extend beyond this paradigm? Descartes, 1637 referred to as the father of cybernetics due to his study of the human body as a machine, popularized the age-old thesis that mind and body are distinct from each other. He argues that even though he may have a body, his true identity is that of a thinking thing alone and, indeed, his mind could exist without his body. He argues that humans are spirits, which occupy a mechanical body, and that the essential attributes of humans are exclusively attributes of the spirit (such as thinking, willing and conceiving), which do not involve the body at all. While this has been considerably debated in the field of AI and robotics over recent decades, it has become generally accepted that embodiment is key to the development of AI in machines. But what if we wanted to build artificial systems that extend beyond this traditional paradigm? This paper draws on Descartes notion of “spirit” and extends this to a one-mind-many-bodies paradigm. The following sections provide first, a background discussion of the artificial mind and the issues surrounding artificial intelligence and robotics with regard to the traditional paradigm of one-mind-one-body. Following this, Section 4 takes a step away from existing approaches in the context of the reasoning machine and looks at how a “spirit” can be embraced in artificial systems with the added dimension of being able to change and “possess” different bodies. Section 5 looks at some of the core features of artificial systems and argues how the function and form of the machine are inextricably linked, but which are still subject to observer-based dependencies. Finally, Section 6 presents some fundamental tenants that underpin the next generation of reasoning machines. 2. The artificial mind The relationship of the mind and body has been a psychological and philosophical problem for many years. From both philosophical and scientific theories, the mind-body relationship can be divided into two main categories: monistic and dualistic. First, monistic theories suggest that mind and body are not independent of one another. Behaviourists (including the likes of Aristotle, Hobbs and Hegel) hypothesised that mind was nothing more than a function of the body. Idealists, like Berkeley, Leibniz and Schopenhauer suggested that the body was just a mental representation. Spinoza proposed a theory of double-aspectism which postulates that mind and body are distinguishable but not inseparable. Second, dualistic theories are of the view that mind is seen as distinct from the body and not made up of any physical substance. Some popular dualists include Descartes, Locke and James who belonged to a branch of dualism known as interactionism. Descartes, as an interactive dualist, believed that there was a distinction between the

Future reasoning machines: mind and body 1405

K 34,9/10

1406

human mind (or soul) and the physical body, describing the mind as “[a] thing or substance whose whole essence or nature was only to think . . . has no need of space nor of any material thing or body. . . . This mind . . . is entirely distinct from the body” (Descartes, 1993). He claims that a body without a soul would be an automaton, responding to external stimuli, while a soul without a body would have consciousness but only of innate ideas, lacking any sensory impressions (Francher, 1979). Interactionists believed that, although mind and body were of a very different nature, it was the interaction between the two that produced many aspects of human nature. The prevailing view in cognitive science today is that the human mind consists of distinct faculties dedicated to a range of cognitive tasks: linguistic, social, practical, theoretical, abstract, spatial and emotional. Mental processes in humans are generally viewed as not being solely internally represented symbol-manipulating algorithms, and thus the notion of a robot having a mind using the human mind as the frame of reference becomes an issue. Arguing against artificial system having a mind is similar to discussing whether the system simply operates at a level of syntax without semantics (i.e. a computer acting the role). Programs operating on a machine can be seen to be semantically blind, merely mimicking the grasp of meaning according to both the rule set employed and the data received. It does not understand the information; it merely has a methodology capable of dealing with it, a form of mapping between input and output. Searle (1980) animates this stance in his Chinese Room Argument. This paper argues against applying the term mind with respect to machines and makes the distinction between mind and the artificial mind. The term artificial mind refers to an artificial entity’s reasoning mechanism, independent of particular implementation technologies which have been developed for its interaction with both its physical and social environments. This undoubtedly includes our worlds and therefore its interaction with us. Machines with minds may arguably not exist, but the importance of AI is that machines with artificial minds can exist because as humans we tend to interpret the artificial entity according to our frame of reference. That is, we basically anthropomorphise and adopt the intentional stance (Dennett, 1987)[1]. 3. Robotics and AI A predominant theme within AI research is to focus on the development of functional components and solutions to narrow problems, with limited abstraction and consideration of the broader objectives of AI. Artificial intelligence was initially interpreted as an attempt to prove the Physical-Symbol System hypothesis where “formal symbol manipulation is both a necessary and sufficient mechanism for general intelligent behaviour” (Simon, 1957). Efforts to solve the AI problem that follow this hypothesis are now termed the classical AI approach. Simon maintained that the human cognitive system is basically a serial device. When results were subjected to human interpretation, classical AI provided a rich source of control ideas. Problems arose when these control paradigms were applied to robotics, and in particular the control of mobile robots. The original theory that robots would simply provide the sensors and actuators for an artificial brain, when constructed, became seriously flawed.

The robot Shakey (Nilsson, 1984) provided a useful calibration for classical AI and its original idea of developing some form of artificial mind (effectively an artificial reasoning mechanism). While focused strategies to specific solutions are essential, they generally merely provide the mechanisms upon which more complete systems can be constructed. By reviewing achievements and failures to date within robotics and AI, an insight is acquired into the continued relevance and attainability of the grand challenge. Problems arose with real time performance and stability through, for example, sensor noise and demands of maintaining representational model validity. More elaborate models necessitated ever increasing computational effort that often proved too cumbersome and not sufficiently responsive for real world applications. It became apparent that understanding system-environment interaction was fundamental in achieving robust control for autonomous robots existing within a physical world. This classical approach viewed mind as distinct from body and took a non-interactive dualistic approach. Early research in the field of artificial intelligence worked on developing artificial minds that were effectively disembodied with minimal interaction with any world (real or otherwise). However, this has a fundamental flaw, in that “a program integrated in a computer with no visible appearance nor autonomous physical interaction with the real world has a more difficult time to be viewed as intelligent, whatever the power of its problem solving and the sophistication of its knowledge” (Steels, 2000). The inability of such classical AI systems to handle unconstrained interaction with the real world led to a search for new control architectures for autonomous robots. Recent research into embodiment, sociality and emotions are now approaching the problem from a new angle. This new AI has assumed a stance similar to double-aspectism. While mind and body are viewed by some as distinguished separate components they are not necessarily inseparable. A series of provocative papers by Brooks (1986, 1990, 1991), argued that real world autonomous systems or embodied systems must be studied in dealing with the problems posed by classical approaches. While not a new concept, Brooks’ popularisation of the reactive approach served as a useful catalyst in looking for more embodied approaches to artificial cognition. Issues in real time processing became very real, for example if the robot could not cope and it crashed into something. Only by direct interaction could the robot gain an understanding of the environment. For either of the deliberative or reactive approaches, a robot requires a control architecture. This architecture determines how behaviour is generated based on signals from sensors and invoking motor responses. Reactive approaches which aggregate large numbers of simplistic non representational reasoners have led to emergent “intelligent” behaviour (Braitenburg, 1984; Fukuda et al., 1989; Kube and Zhang, 1993; Lucarini et al., 1993). While founded in embodied robotics, these have not proved sufficient in order to achieve complex goals and suffer from issues of repeatability and the absence of a strong theoretical model. In contrast, deliberative architectures have displayed reflective reasoning capabilities but may lack the responsiveness and robustness demanded by real world applications. Thagard (1996) defines a current central hypothesis of cognitive science, the computational-representational understanding of mind (CRUM): “Thinking can best be understood in terms of representational structures in the mind and computational

Future reasoning machines: mind and body 1407

K 34,9/10

1408

procedures that operate on those structures”. While there is much speculation regarding the validity of this statement, he continues by stating that this central hypothesis is general enough to encompass the current theories in cognitive science including connectionism. Interestingly, while strong embodiment, as discussed in Brooks (1986), Thagard (1996) and Duffy (2000), continue to prevail as a necessary criteria for achieving stronger notions of intelligence in artificial systems, the fundamental mechanisms used in trying to build strongly embodied systems today are inherently symbolic in nature. The process control is achieved via symbolic computers. So, can embodied artificial cognitive processes be really achieved to the extent required to realise a strong notion of intelligence when our references for intelligence are based on natural systems? It is like trying to make machines into natural entities, or inversely, to reduce natural systems to machines, an issue regularly discussed in AI. Two diametrically contrasting issues arise: (1) If the reference for intelligence and the barometer for gauging degrees of intelligence is that of the human then anything less than a human in all capacities is unsuccessful. (2) If the qualifier artificial is emphasised, then the process of comparison becomes more of an analogy, with limitations. Proponents of strong AI believe that it is possible to duplicate human intelligence in artificial systems where the brain is seen as a kind of biological machine that can be explained and duplicated in an artificial form. This mechanistic view of the human mind argues that how people think could be revealed through an understanding of the computational processes that govern brain characteristics and function. This would also provide an insight into how one may realise an artificially created intelligent system with emotions and consciousness. In contrast, advocates of weak AI believe that human intelligence can only be simulated. An artificial system may only give the illusion of intelligence (i.e. the system exhibits those properties that are associated with being intelligent). In adopting this weak AI stance, artificial intelligence is an oxymoron. Having briefly reviewed the AI and robotics journey to date, we wish to pause and consider the function of the reasoning machine, and the associated possibilities in terms of its divergent forms. In clearly distinguishing between artificial and biological systems at a control level, this inherently draws a distinction between their capabilities. If the system is a machine, this effectively centres its functionality on its mechanistic construction, which can be to a designer’s advantage. 4. Function of the machine What is the function of the ultimate reasoning machine? With little references, our ability to invent something beyond the capabilities of what we see around us can become difficult. Could we even understand something so different, let alone invent it? Robot success to date is based upon an ability to determine those tasks for which the robot is particularly apt. Examples include assembly, repetitive pick and place, hazardous substance manipulation, welding and spray painting. Their role as a tool is clear and their function relies and exploits the properties of machines. Given that machines have a fundamentally different capability set, to constrain it to our capabilities is simply inappropriate. The issue becomes what could it do that exploits

its inherent functionality? At a very basic level, are the human frames of reference of one-mind-one-body still valid in developing a reasoning machine’s functional capabilities? The following subsections challenge the prevailing concept in the field of artificial intelligence of one-mind-one-body and exemplify the principle of this paper which is to not become limited in one’s design and development of artificial systems.

Future reasoning machines: mind and body

4.1 Free the mind Like the rationalist tradition in philosophy (Descartes, Leibniz, Kant, Husserl) AI research holds that the mind is fundamentally rational, representational and rule-governed. Because of this, modern philosophers like Dreyfus (1972) argue that AI research will fail because it falls prey to precisely the same issues that were directed against the rationalist tradition in philosophy. Furthermore, an animal mind is an aggregation of a vast number of highly parallel, asynchronous, analogue processes. In contrast, artificial intelligence to date is based on digital devices that in most cases can only give the illusion of parallel, asynchronous behaviour. By their very nature, such devices can only ever give an approximate illusion of (artificial) mind. So, maybe traditional philosophies of mind cannot be applied directly to digital entities but rather should only be used as analogy. Such entities, whether a real robot or a virtual avatar for example, can be viewed as virtual entities where it is impossible for that entity to inherently know for certain whether their instantiated platform is a robot or an avatar. It is a control program run on a CPU. As current AI technology is based on digital devices, and as such all input/output and processing in an AI mind is through digital means, they do not necessarily require a fixed platform, just as long as the platforms on which they are instantiated support these computational entities. It is important to note that this does not necessarily undermine the embodiment debate but rather embraces the context of the artificial entity existing through its body instantiations in a physical and social environment. While being physically and socially grounded is fundamental, the added dimension presented here is that the form of its body can change. It would be incorrect to base arguments against the one-mind-many-bodies idea on embodiment arguments that are grounded in natural systems, i.e. that in order to artificially develop a system that displays intelligence, one must achieve the degree of system-environment integration found in natural systems. An autopoetic system is very much distinct from an allopoietic system (Sharkey and Zeimke, 2000). The aim here is more to develop reasoning machines rather than humanlike intelligent machines. Consequently, in using the prefix artificial, this new form of intelligence should exploit its inherent differences to humans rather than be artificially constrained by it. While strong embodiment prevails as a grounding in aiming to achieve an artificial notion of intelligence in its true form, Descartes’ original proposal that mind is distinct from body rises again. By embracing functionality provided by mechanistic solutions (“telepathy” through wireless communication), the long standing notion of one-mind-one-body is challenged. When applied to artificial intelligence, the general Cartesian model suggests that an agent has a distinct mind and body, but yet the mind relies upon the body and the body upon the mind in order for them both to operate successfully. It should be noted that distinctions between the physical instantiation of a body rather than using a virtual representation are often highlighted. It is argued that such

1409

K 34,9/10

1410

physicality is viewed as a requirement for intelligent behaviour where its reference is our real and complex world. This is a different debate and the view adopted here is that the physical context is always present, whether the interaction is through robot actuators or VR interaction devices with users in the physical environment. Within the context of the one-mind-many-bodies metaphor, an agent’s actions can be significantly enhanced: . First, there is no restriction upon the agent’s embodiment form should take. Numerous guises may be adopted, for example that of a physical robot, or an avatar in virtual reality or a small 2D animated if suitable for display on a PDA. . Second, the artificial mind or agent could migrate between a myriad of instantiated embodiments akin to a ghost moving and possessing different bodies. The behaviour of the agent will be dictated not only by the agent’s goals but also by the embodiment that the agent has adopted. The choice of embodiment must not only empower the agent, but maintain the agent’s identity in the eyes of the user. This can be achieved in a number of different ways, including: . preserving key referential characteristics across the different instantiations, for example the agent’s colour scheme or eyes; and . using transitions that maintain the presence of the agent throughout the transformation process. The agent is thus unconstrained to any particular environment, physical or virtual. Opportunistic migration both between and within different environments, physical and virtual, should exploit the functional capabilities within each. One important platform demonstrating this de-restriction from our frame of reference is an agent within a 3D virtual environment. Virtual environments (VE) or virtual reality (VR) have a number of distinct advantages over other forms of real world instantiations (such as robotics). Within VR, the rules of the real world, for instance gravity, need not necessarily apply. The agent form is likewise unconstrained, as it is capable of mutation in order to suit the task at hand, it may even choose to abandon one form of embodiment and adopt an entirely new one. The objective is therefore to augment the artificial entity’s functionality beyond our own frame of reference. The following case study demonstrates the one-mind-many-bodies metaphor. 4.2 Case study: Agent Chameleons The Agent Chameleons Project (Duffy et al., 2003; O’Hare et al., 2003) strives to develop digital artificial minds that can seamlessly travel between and within physical and digital information spaces. Three key attributes of migration, mutation and evolution underpin this concept, and can be invoked in response to environmental change, ensuring the survival and longevity of the agent. The traditional concepts of agent environment and its constraints are expanded through the use of agent migration. Agents are capable of mobility between embodiments in virtual environments (e.g. virtual avatar), embodiments in physical environments (e.g. robot), and software environments (e.g. OS desktops, PDA’s)

(Figure 1). Once instantiated in the world, the agent has knowledge of that world, and of its capabilities therein. Key technologies underpin such nomadic characteristics. These include white and yellow pages services for the location of people and services, respectively. Agents would thus have access to directory services that permit access to publicly available resources and other agents. The agents would typically have a private directory for accessing resources specific to its owner and not publicly available. These agents act as a proxy for the user in the real and virtual worlds, as well as allowing the user remote access to their devices and public resources. Furthermore, the chosen embodiment of the agent must be capable of change, of agent mutation. This is particularly true in VR, where the agent is free of any constraints that exist in the real world. The agents must be capable of modifying their embodiment instantiation in response to the environmental and task specific events. For instance, in an outer space-like VR environment, the agent could adopt the persona of a rocket to allow it to fly, thereby facilitating human interpretation. The agent must exhibit the ability to dynamically select an appropriate form with associated functional portfolios. Additionally, the system must be extensible; it should be possible for the easy addition of new types of embodiment instantiations for different situations. Alas, all platforms are not created equal (e.g. varying memory, processing power, bandwidth, display characteristics). Consequently, these agents have to be able to adapt to different conditions. Agents need to be able to evolve their very form. As they move from device to device they may necessarily have to shed some of their characteristics. This is analogous to exfoliation. Upon platforms that may not be able to handle an agent in its complete form, the agent is able to reduce elements that are non-essential to its task at hand and scale down its capabilities. Alternatively, on platforms with minimal resources, the agent merely sends a minimally sufficient component of itself, which is required only for its current task, while the bulk of the agent can remain dormant and dismembered on the source device. Upon task completion, both parts reintegrate and the agent continues on its way. Furthermore, an agent can, in certain circumstances, clone itself. Such circumstances may include those where it feels under threat, or under heavy resource demands. This elastic evolution

Future reasoning machines: mind and body 1411

Figure 1. The Agent Chameleon spirit and its body instantiations: mobile devices (PDA), robot, PC, web, and virtual reality, respectively

K 34,9/10

1412

would empower the agent with unforeseen versatility. The form of artificial evolution and adaptivity discussed is currently being developed within the Agent Chameleons framework (http://chameleon.ucd.ie). These concepts resonate with initiatives such as IBM’s autonomic computing (Horn, 2001) and that of Intel’s proactive computing (Tennenhouse, 2000). Central to such initiatives is software comprised of confederations of autonomous and social agents which are capable of such facilities as self-healing, self-protection, self-configuration and self-optimisation. With regard to embodiment issues, the body is always present and the reasoning of the agent is dependent upon that body as it provides the system with its actuator and preceptor functionality. However, the form of that body is not constrained; the agent is capable of adjusting it or adopting an entirely new one to suit the task at hand. Each different embodiment instantiation fundamentally changes the viewing metaphor for that agent and its associated functional portfolio. A key result of this work is that the issues of identity and association between the user and the agent chameleon are maintained through behavioural and visual cues as it migrates and mutates across platforms. This work demonstrates how the traditional paradigm of one-mind-one-body can be extended beyond such a human-based reference to one-mind-many-bodies, thereby providing new core functionality for an artificial entity. Results to date have demonstrated the flexibility of an agent capable of migrating between platforms. Demonstrations at Media Lab Europe to visitors were found to successfully maintain the identity of the agent across platforms whilst employing the fundamentally different features of each as shown in Plate 1 (physical mobility: Khepera robot, speech and facial gestures: Anthropos robot head; mutation and cloning: as facilitated through the virtual reality instantiations). 5. Form of the machine The previous sections have discussed the changing function of the reasoning machine. With the permeation of computational devices in our society, the flexibility of artificial systems is changing. The next step is to look at how we will interact with these systems, how we will interact and understand these machines. Conflicting arguments exist for and against the human form as a frame of reference for reasoning machines and these will continue to haunt robotics (see Duffy and Joue, 2004 for a discussion). The fundamental issue is how to achieve a balance between the function and form of the reasoning machine. Is the entity so strongly humanoid to the extent that we have the replicant problem as found in Dick’s (1968) famous novel: “Do Androids dream of electric sheep?”. But, the functionality of the robot is then

Plate 1. The current Agent Chameleon body instantiations as robots, VR avatars, entities on PC’s and mobile devices

constrained by the human function and form. If aspects of the human form are used judiciously to facilitate human-robot social interaction (Plate 2) (Duffy, 2003), its capability set can diversify from our own and embrace inherently mechanistic capabilities and possibilities (e.g. vision beyond human visible spectrum, wireless communication, multi-actuator derived degrees of freedom, auditory enhancement). The influence of the appearance and the voice/speech of an entity on people’s judgements of another’s intelligence have been demonstrated in experimentation. The more attractive a person, the more it facilitates others to rate the person as having higher intelligence (Alicke et al., 1986; Borkenau, 1993). However, when given the chance to hear the person speak, people appear to rate their intelligence more on verbal cues than on their attractiveness (Borkenau, 1993). Exploring the impact of such hypotheses to HCI, Kiesler and Goetz (2002) undertook experimentation with a number of robots to ascertain if participants interacting with robots drew similar assessments of “intelligence”. The experiments were based on visual, audio and audiovisual interactions. Interestingly the results showed strong correlations with Alicke et al.’s and Borkenau’s experiments with people-people judgements. When we start to engage robots at a more complex level than our current interactions with washing machines, our propensity to anthropomorphise becomes inevitable. The important criterion is to seek a balance between people’s expectations and the machines capabilities (Duffy, 2003). The following case study explores our propensity to ascribe such notions as intelligence and emotions to machines.

Future reasoning machines: mind and body 1413

5.1 Case study: emotion machines The work presented in Bourke and Duffy (2003) demonstrates the ease with which people are willing to ascribe human-like characteristics such as emotion and intelligence to small robots performing computationally simple behaviours. This effectively highlights how much “mind” one is willing to ascribe to an artificial entity with little or no explicit design decisions involved. The first stage of the experimentation involved the design and implementation of seven independent behaviours on standard Khepera I robots as shown in Figure 2

Plate 2. Media Lab Europe’s “JoeRobot” at the Flutterfugue performance with SmartLab and NYU CATLab in London 2002 (Photo courtesy of Brent Jones)

K 34,9/10

1414

Figure 2. Experiments with Kepera I robots

(with some equipped with the wireless communication module). These were videotaped and a questionnaire was designed asking the observer to explain what the robots were doing, to pick three characteristics they would associate with the robots, and to grade these characteristics. Access to this questionnaire was distributed among a widely varied audience through internet mailing lists. In the second stage, the same robots were dressed using coloured felt and given aesthetic “eyes” and the same behaviours were implemented. The same questionnaire was repeated. The results indicated that in the first set of experiments, people who took part in this experiment concentrated their efforts on describing exactly what moves the robot was taking. Efforts were made to explain the behaviours from a purely technical aspect, with “searching” and “learning” very common words used. However, it is useful to note that people seemed to easily see past the mechanics of the robots, and began to describe them as if they possessed some human-like qualities such as social interaction capabilities where no such explicit behaviours existed. The antenna on one of the robots was also interpreted as corresponding to a tail a number of times and some parallels were drawn with dog behaviour. In the second experiment, even stronger human-like features such as being “alive” and “playful” were reported. One interesting example, where one robot approaches an immobile second robot, moves around it, performs a shaking behaviour as if vying for attention, whereupon the other suddenly moves away, was explained in the context of the observer’s interaction with their husband; on attempting to talk to him, he ignores her for a while, then just walks away. This work raises the question of whether a system is required to be inherently intelligent or emotional in order for it to be interpreted as such. It is an orthogonal view of the pursuit of a system that one views as intelligent. An interesting aspect then arises. If the system can create the illusion of being intelligent and emotional, can it be maintained over time? Will its failing become apparent through our interaction with it? Similar to whether it appears intelligent or not, the issue of resolution will prevail. If the fake is good enough, we won’t know the difference. It is important to recognise Shneiderman’s (1988) arguments against anthropomorphism, which state that people employing anthropomorphism compromise in the design, leading to issues of unpredictability and vagueness. The argument effectively distils down to a distinction as to whether one can maintain the function of the robot as a tool or not. When actuation and perception mechanisms are employed on the system to engage with its physical and social environment, the notion of the system remaining purely a tool becomes less manageable and consequently

anthropomorphism is unavoidable. It is how we manage the form and consequently the anthropomorphism that becomes the important issue (Duffy, 2003). If we are so willing to ascribe standard social interaction frames of reference to clearly artificial systems, as demonstrated in these and other similar experiments, we should not fear developing technologies that clearly extend beyond our own capabilities as discussed in Section 4. It is then the task of the designer to facilitate and maintain our “bond” with new future machines. 6. The future machine It can be more tangible and manageable to use ourselves and similar standard paradigms as frames of references in designing artificial systems. But, as Einstein is reputed to have said, “as far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality”. There are many fundamental distinctions between artificial and natural systems. From time to time we need to stop, think laterally, try to free the artificial system and allow it exploit these differences. In the future, digital personal assistants will emerge to take advantage of their abstraction from a particular environment or platform and migrate, mutate, even clone as presented in the Agent Chameleon work discussed earlier. Our perception of these systems may also change. Interestingly, when experiencing “Ada – l’espace intelligent” (Delbruck et al., 2003), a room where the user interacts with the room as much as the room interacts with the user, the aspect of the unknown and the elemental communication with this room helped created a strong sense of intelligence. Complexity is not necessarily the solution to creating an impression of a system being “intelligent” and this will influence the pursuit of a system that we will view as artificially intelligent. A number of fundamental tenants that underpin this next generation of reasoning machine are considered in the following paragraphs. Key to these perspectives, as reinforced in the previous two case studies, are (1) the embracing of those features and capabilities inherent to artificial systems, and (2) the management of our willingness to anthropomorphise in our interactions with these systems. 6.1 Nomadic agents The Agent Chameleon work outlined previously, regarded the agent as an entity empowered with autonomy, human-computer interaction facilities, and a fundamental mobility. The embodiment instantiation thus merely becomes the container for a digital mind, which opportunistically migrates between devices. The specifics of the hardware now reflect the capability set of the autonomous mind. The presence of the agent moving through cyberspace, as the user moves through physical space, allows the associated user to be available at anytime through the agent and vice versa. Such nomadic agents also have the capacity to exploit an elastic cloning. This involves the cloning of an agent into two or more agents for a particular task and then the “offspring” agents returning to the parent and all fusing together as one when the task is completed. Similar to platform migration, such temporary cloning is a concept that is facilitated through the technological advantages of software-based virtual systems, functionality with little to no basis in biology.

Future reasoning machines: mind and body 1415

K 34,9/10

1416

A clear application of such nomadic evolvable agents is that of an autonomous “intelligent” digital assistant that is independent of any one physical device. These entities will effectively give any user their own personal assistant that will help with the information overload in daily life, assisting with personal communications and offer a generic interface to any number of devices. They will have the ability to react to the current needs of their user, and beyond this, grow and learn to anticipate future needs and requirements. Perhaps our vision can be best summed up by Luc Steels’ metaphor for what the robots of the future will be like: “[it] is related to the age-old mythological concept of angels. Almost every culture has imagined persistent beings, which help humans through their life. These beings are ascribed cognitive powers, often beyond those of humans, and are supposed to be able to perceive and act in the real world by materialising themselves in a bodily form at will ”. He goes on to detail how Angels may “project the idea of someone protecting you, preventing you from making bad decisions or actions, empowering you, and defending you in places of influence” (Steels, 2000). 6.2 A real fake When the fake is so good, we won’t be able to tell the difference between whether it is real or not. The Machiavellian Intelligence hypothesis (see Kummer et al., 1997 for a recent discussion) proposes that intelligence as we understand it evolved from the social domain where social interaction between entities is key to the development of intelligence. It is because of our developing social interaction with machines, which are becoming more and more autonomous, that our perceptions of whether they are intelligent or not, or even how intelligent, becomes an issue. Relative to our capability set, the idiosyncrasies of a robot embedded in our physical and social spaces equipped with such existing systems as flawed vision, annoying speech and woefully inadequate sensor systems could be the physical equivalent of a spam assault through chronic annoyances. They have their tragic flaws and may therefore become as alive as we are. This also raises an interesting point about machines as “constant companions”; what are the health and environmental drawbacks of having machines embedded in our physical and social space as autonomous entities? The solution to these problems is to define the task, the function, for the machine. This will dictate the form, which, in conjunction with the function, should embrace the fact that it is a machine, not confuse it. 7. Conclusion The demands on machines like robots have dramatically increased during the last decades. No doubt, the film industry has contributed greatly in moulding the imagery with which we associate the reasoning machine. Often it is presented as a friendly, hard-working and droll creature, such as R2D2 in Star Wars (Lucas Films, 1977) and the more human-like character of Data in the Star Trek (1987) series. Creating an artificial being based on the blueprint of humans seems to be particularly compelling due, in part, to the basic effort of mankind to reproduce itself or even for the desire to be immortal. However, the film industry is not satisfied with presenting the bright side of machines. It also projects people’s fears of machines into characters like The Terminator (MGM/UA, 1984), a nearly indestructible cyborg assassin.

This paper has sought to review and assess our perceptions of reasoning machines. Within this paper we have reflected upon the mind-body debate and the monistic versus dualistic standpoints. We have sought to extend the one-mind-one-body approach to accommodate a one-mind-many-bodies metaphor. Within this metaphor the agent’s embodiment form may take many guises with the artificial mind or agent potentially exhibiting a nomadic existence opportunistically migrating between a myriad of instantiated embodiments. The choice of embodiment must, not only empower the agent, but maintain the agent’s identity in the eyes of the user. Central to this is the need to preserve key referential characteristics across the different instantiations. The title of this paper is intentionally not called “future intelligent machines”. It is not the aim of the ideas proposed to argue against the necessity of embodiment in the pursuit of the artificially intelligent system, but rather to seek to take an orthogonal perspective and, whilst employing those technologies developed in the field of AI research, realise systems with new and fundamentally different capabilities. It is also questionable whether the term “intelligent” can be justifiably used in the majority of AI research to date with its interpretation being rather nebulous and vague. We postulate a new generation of reasoning machines, which evolve and demonstrate autonomic characteristics. These machines are social (Duffy, 2000), autonomous, intentional and are equipped with rudimentary self-healing, self-protection, self-configuration and self-optimisation capabilities[2]. The scale of each of these features is primarily dependent on the complexity required and deployed, and draws on vast research to date which address these specific issues. The agent’s sophistication is dependent on the extent of these technologies employed. The key feature, as presented in this work, is to highlight the fundamental perspectives that future reasoning machines can adopt. While anthropomorphism in robotics raises issues about the taxonomic legitimacy of the classification human, and its sole association with ourselves, the question of whether machines will ever approach human capabilities persist. Technology is now providing robust solutions to the mechanistic problems that have constrained robot development thus far, thereby allowing robots to permeate all areas of society from work to leisure. The key is to take advantage of these reasoning machines and their capabilities rather than constrain them. We just keep in mind something like Asimov’s (1994) Laws of Robotics, and remember where the OFF button is. Notes 1. Consciousness is not discussed in this work although the ideas presented undoubtedly promote this discussion. 2. These capabilities are better served in software rather than hardware – the use of software migration strategies unloads the hardware complexities required to achieve this functionality and invariably the chances of it failing in the first place. References Alicke, M.D., Smith, R.H. and Klotz, M.L. (1986), “Judgments of physical attractiveness: the role of faces and bodies”, Personality and Social Psychology Bulletin, Vol. 12 No. 4, pp. 381-9. Asimov, I. (1994), I, Robot, Bantam Books, London.

Future reasoning machines: mind and body 1417

K 34,9/10

1418

Borkenau, P. (1993), “How accurate are judgments of intelligence by strangers?”, paper presented at: Annual Meeting of the American Psychological Association, Toronto, ON, August. Bourke, J. and Duffy, B.R. (2003), “Emotion machines: projective intelligence and emotion in robotics”, paper presented at IEEE Systems, Man & Cybernetics Workshop (UK&ROI Chapter), Reading, MA, September. Braitenburg, V. (1984), Vehicles – Experiments in Synthetic Psychology, MIT Press, Cambridge, MA. Brooks, R.A. (1986), “A robust layered control system for a mobile robot”, IEEE J. Rob. and Autom., Vol. 2 No. 1. Brooks, R.A. (1990), “Elephants don’t play chess”, Robotics and Autonomous Systems, Vol. 6, pp. 3-15. Brooks, R.A. (1991), “Intelligence without representation”, Artificial Intelligence Journal, Vol. 47, pp. 139-59. Delbruck, T., Eng, K., Baebler, A., Bernardet, U., Blanchard, M., Briska, A., Costa, M., Douglas, R., Hepp, K., Klein, D., Manzolli, J., Mintz, M., Roth, F., Rutishauser, U., Wassermann, K., Wittmann, A., Whatley, A.M., Wyss, R. and Verschure, P.F.M.J. (2003), “Ada: a playful interactive space, human-computer interaction”, in Rauterberg, M. et al. (Eds), INTERACT’03: IFIP TC13 International Conference on Human-Computer Interaction, Zu¨rich, 1-5 September 2003, IOS Press, Amsterdam, pp. 989-92. Dennett, D. (1987), The Intentional Stance, MIT Press, Cambridge, MA. Descartes, R. (1637), Discourse on Method and Meditations on First Philosophy (3rd ed., 1993, Switzerland, Reprint, Cambridge Hackett Publishing, Indianapolis, IN). Dick, P.K. (1968), Do Androids Dream of Electric Sheep?, 3rd ed., Del Rey (Reissue edition June 1996). Dreyfus, H. (1972), What Computers Can’t Do: The Limits of Artificial Intelligence, MIT Press, Cambridge, MA. Duffy, B.R. (2000), “The social robot”, PhD thesis, Department of Computer Science, University College Dublin, Dublin. Duffy, B.R. (2003), “Anthropomorphism and the social robot”, Robotics and Autonomous Systems, Vol. 42 Nos 3/4, pp. 170-90. Duffy, B.R. and Joue, G. (2001), “Embodied mobile robots”, paper presented at the 1st International Conference on Autonomous Minirobots for Research and Edutainment – AMiRE2001, Paderborn, 22-25 October. Duffy, B.R. and Joue, G. (2004), “I, robot being”, paper presented at Intelligent Autonomous Systems Conference (IAS8), Amsterdam, 10-13 March. Duffy, B.R., O’Hare, G.M.P., Martin, A.N., Bradley, J.F. and Scho¨n, B. (2003), “Agent Chameleons: agent minds and bodies”, paper presented at the 16th International Conference on Computer Animation and Social Agents (CASA 2003), Rutgers University, NJ, May, pp. 7-9. Francher, R.E. (1979), Pioneers of Psychology, W.H. Norton & Company, New York, NY. Fukuda, T. et al. (1989), “Structure decision for self organising robots based on cell structures”, IEEE: Rob. & Autom, Scottsdale, AZ. Horn, P. (2001), “Autonomic Computing: IBM’s perspective on the state of information technology”, IBM Corporation, p. 1, available at: www.research.ibm.com/autonomic/ manifesto/autonomic_computing.pdf (accessed 15 October). Kiesler, S. and Goetz, J. (2002), “Mental models and cooperation with robotic assistants”, Proceedings of CHI.

Kube, C.R. and Zhang, H. (1993), “Collective robotics: from social insects to robots”, Adaptive Behavior, Vol. 2 No. 2, pp. 189-219. Kummer, H., Daston, L., Gigerenzer, G. and Silk, J. (1997), “The social intelligence hypothesis”, in Weingart et al. (Eds), Human by Nature: Between Biology and Social Sciences, Lawrence Erlbaum Assoc, Hillsdale, NJ, pp. 157-79. Lucarini, G., Varoli, M., Cerutti, R. and Sandini, G. (1993), “Cellular robotics: simulation and HW implementation”, Proceedings of the 1993 IEEE International Conference on Robotics and Automation, Atlanta GA, May, pp. III-846-852. Nilsson, N.J. (1984), “Shakey the robot”, Technical Note 323, SRI A.I. Center, April. O’Hare, G.M.P., Duffy, B.R., Schoen, B., Martin, A.N. and Bradley, J.F. (2003), “Agent Chameleons: virtual agents real intelligence”, paper presented at 4th International Working Conference on Intelligent Virtual Agents (IVA), LNCS Springer Verlag, Kloster Irsee, 15-17 September. Searle, J. (1980), “Minds, brains, and programs”, The Behavioral and Brain Sciences, Vol. 3, pp. 417-57. Sharkey, N. and Zeimke, T. (2000), “Life, mind and robots: the ins and outs of embodied cognition”, in Wermter, S. and Sun, R. (Eds), Symbolic and Neural Net Hybrids, MIT Press, Cambridge, MA. Shneiderman, B. (1988), “A nonanthropomorphic style guide: overcoming the humpty-dumpty syndrome”, The Computing Teacher, 9-10 October. Simon, H. (1957), Administrative Behavior: A Study of Decision-making Processes in Administrative Organization, 2nd ed., Macmillan, New York, NY. Steels, L. (2000), “Engeln mit Internetfluegeln. German version of Digital Angels”, Die Gegenwart der Zukunft, Verslag Klaus Wagenbach, Berlin, pp. 90-8. Tennenhouse, D.L. (2000), “Proactive computing”, Communications of the ACM, Vol. 43 No. 5, pp. 43-50. Thagard, P. (1996), Mind, Introduction to Cognitive Science, MIT Press, Cambridge, MA. (Brian Duffy’s research aims to understand man-machine interaction from the perspective of socially capable robots situated in the Media Lab Europe’s office environment. This seeks to research the fine line between observed and designed function and form. It is an exploration of the illusion of life and intelligence in artificial entities. Brian’s interest in designing and building social humanoid robots has developed from previous research at the Department of Computer Science at University College Dublin where he completed a doctoral thesis. Prior to this, Brian spent two years in artificial intelligence research and building robot prototypes at GMD’s (now the Fraunhofer-Gesellschaft Institute) Autonomous Intelligent Systems Institute. Before moving to Germany in 1994, two years were spent at Institut National des Sciences Applique´es de Lyon in France working in the field of distributed artificial intelligence and multi-agent systems. Gregory O’Hare is Head of Department of Computer Science at University College Dublin (UCD). Prior to this he was a member of faculty at the University of Manchester Institute of Science and Technology (UMIST). He is director of the PRISM (Practice and Research in Intelligent Systems and Media) Laboratory within the Department of Computer Science. His research focuses upon multi-agent systems (MAS) and mobile and ubiquitous computing. He has published some 120 journal and conference papers in these areas together with two textbooks. Gregory has secured some 6 million euro research funding for his research. He has acted as consultant for many national and international companies and organisations. John Bradley received his Bachelor Degree in Computer Science in 2002 from the University College Dublin (UCD), Ireland. From there he joined the Agent Chameleons project (a joint

Future reasoning machines: mind and body 1419

K 34,9/10

1420

collaboration between the Computer Science Department UCD and the Anthropos Group in Media Lab Europe) as a PhD Student. John’s current research involves the self-adaptation of agents, as they migrate between heterogeneous platforms, driven by deliberative mechanisms. His other research interests include agent technologies, multi-agent systems, distributed artificial intelligence, agent migration, agent adaptation and robotics. Alan Martin completed a Bachelor of Science Degree in Computer Science at University College Dublin, Ireland, and is now a PhD candidate there, working in collaboration with Media Lab Europe. Alan’s research interests include collaborative virtual environments, immersion, animated synthetic characters, agents and robotics. Bianca Schoen completed a Bachelor of Science Degree at the University of Applied Sciences in Darmstadt, Germany. During her studies she gained experiences not only in economic enterprises like T-Systems debis Systemhaus GmbH in Darmstadt and the H.A.S.E GmbH in Hu¨nfelden, but also in scientific institutes like the Frauenhofer Institut fu¨r Grafische Datenverarbeitung and the T-Systems Nova Technology Center, both situated in Darmstadt, Germany. She is now a PhD candidate at the University College Dublin, Ireland, where she is working in a collaboration project with Media Lab Europe. Bianca’s research interests include animated synthetic characters, artificial intelligence, agent technology, evolutionary- and genetic programming and evaluation methodologies.)

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Machine vision methods for autonomous micro-robotic systems B.P. Amavasai, F. Caparrelli, A. Selvan, M. Boissenin, J.R. Travis and S. Meikle

Machine vision methods

1421

Microsystems and Machine Vision Laboratory, Materials and Engineering Research Institute (MERI), School of Engineering, Sheffield Hallam University, Sheffield, UK Abstract Purpose – To develop customised machine vision methods for closed-loop micro-robotic control systems. The micro-robots have applications in areas that require micro-manipulation and micro-assembly in the micron and sub-micron range. Design/methodology/approach – Several novel techniques have been developed to perform calibration, object recognition and object tracking in real-time under a customised high-magnification camera system. These new methods combine statistical, neural and morphological approaches. Findings – An in-depth view of the machine vision sub-system that was designed for the European MiCRoN project (project no. IST-2001-33567) is provided. The issue of cooperation arises when several robots with a variety of on-board tools are placed in the working environment. By combining multiple vision methods, the information obtained can be used effectively to guide the robots in achieving the pre-planned tasks. Research limitations/implications – Some of these techniques were developed for micro-vision but could be extended to macro-vision. The techniques developed here are robust to noise and occlusion so they can be applied to a variety of macro-vision areas suffering from similar limitations. Practical implications – The work here will expand the use of micro-robots as tools to manipulate and assemble objects and devices in the micron range. It is foreseen that, as the requirement for micro-manufacturing increases, techniques like those developed in this paper will play an important role for industrial automation. Originality/value – This paper extends the use of machine vision methods into the micron range. Keywords Cybernetics, Robotics, Nanotechnology, Image sensors, Artificial intelligence Paper type Research paper

1. Introduction The IST-FET MiCRoN[1] project comprises a consortium of eight academic partners located in seven European countries. MiCRoN has its roots in a prior EU project, Miniman[2] (Bu¨erkle et al., 2001) which was completed in January 2001. In Miniman, single dm3-sized micro-robots were developed to perform a generic set of The Microsystems and Machine Vision Laboratory (MMVL) is a division within the Materials and Engineering Research Institute at Sheffield Hallam University. The principal focus of the group is to investigate and develop vision-based techniques aimed at a variety of real-time applications which include microrobotic systems, biological applications, MEMS, nanotechnology, scanning electron microscopy (SEM) and scanning probe microscopy (SPM) applications.

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1421-1439 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614740

K 34,9/10

1422

Plate 1. The Miniman V micro-robot that forms the basis for miniature robotic research in the MiCRoN project

micromanipulation tasks under either an optical microscope or in a scanning electron microscope (SEM). However, Miniman lacked in the ability to perform co-operatively and was limited by its physical size. The results achieved in the Miniman project form the basis of the MiCRoN project, specifically in the development of technologies for the final Miniman V micro-robot, shown in Plate 1. The main goal of MiCRoN is to develop a micro-robotic cluster that is able to perform tasks autonomously. Each of the micro-robot units is equipped with on-board electronics and wireless communication capabilities. The micro-robots are pre-programmed to perform tasks associated with assembly and manipulation in the micrometre range, with the possibility to extend operations to the sub-micrometre range. One of the key motivations behind this project is given by the increasing need in industry for flexible and re-programmable tools that can work in specific micro-scale environments such as under an optical microscope. The use of micro-robots to accomplish these tasks offers a valid alternative to bulky and expensive lab equipment which is generally adapted for a single task. The application of machine vision methods as an active feedback element to the control system adds a further degree of flexibility to the system. Object models to be manipulated by the robots can be acquired off-line, stored and then re-used during the execution of each task. Algorithms can be customised and optimised so that real-time control performance can be achieved, although this is often dependent on the complexity of the task at hand. Within MiCRoN, we foresee that the requirement for future industrial applications will at least be as complex as the robotic platforms in the macro world. However, from a design standpoint there are several caveats. Amongst others, from a scaling point of view, small forces will be amplified and at micro scales many objects become “sticky”. For imaging, the signal-to-noise ratio is reduced as magnification increases. Furthermore, as we reach the limits of optical imaging the quality of images obtained at these scales degrades considerably. Our role within MiCRoN is to develop algorithms for the automatic recognition and tracking of objects under a purpose built image acquisition system. These objects can either be the parts that need to be assembled/handled by robot-mounted tools or the

tools themselves. Generally, the type of tool attached to the robots is specific to the task. The object information from the vision system is used by the control system to control the robots during the execution of pre-planned tasks. This paper is structured as follows. In Section 2 a detailed description of the MiCRoN hardware system is given. In Section 3, the vision subsystem is presented: four different algorithms developed for several task scenarios are described and discussed in detail in subsections 3.1-3.4. Section 4 presents the results currently obtained by the MiCRoN vision system. Finally, Section 5 draws some conclusions from the results described in the previous section.

Machine vision methods

1423

2. System description The micro-robots developed for MiCRoN are approximately 1 cm3 in size and consist of a piezo-electric module that produces high velocity (several mm/s) locomotion combined with a nanometric resolution. The robots navigate on a flat, horizontal surface by translation and rotation. The main task of the locomotion system is to bring the on-board tools to the work area or to transfer the micro-objects to the next station. The main common features of the MiCRoN micro-robots are: . A wireless communication transmission system between the robot unit and the control unit through an adhoc infrared link (Figure 1(a)). . A set of markers used for the global positioning system which is based on the application of projected Moire fields (Figure 1(b)). . On-board electronic circuitry for the activation of the piezo-electric devices. This will control the robot’s motion, generate and amplify the driving signals for all actuators and tools and pre-process the signals from the on-board sensors (Figure 1(c)).

Figure 1. Micro-robot configuration.

K 34,9/10

.

.

1424

.

An interface for the hosting of a number of tools which are interchangeable (Figure 1(d)). A coil which provides power to the on-board electronics. The wireless powering mechanism is based on the principle of induction and makes use of a specially designed power floor on which all the active robots operate (Figure 1(e)). A piezo-electric system used for both locomotion and manipulation (Figure 1(f)).

Two early MiCRoN robot prototypes working co-operatively are shown in Plate 2. The micro-robots are equipped with a variety of on-board tools built within the project consortium. These include: . a functionalised AFM (atomic force microscopy) tip that is able to qualitatively resolve objects down to atomic sizes; . a syringe chip that is built specifically to inject liquid into cells; . a gripper tool that is able to handle micron-sized objects; and . a micro-needle that uses electrostatic forces to grasp micro-objects. In order to assess the performance and evaluate the capabilities of the micro-robotic system, two different scenarios have been devised. The first demonstration involves the soldering of benchmark parts of size 50 £ 50 £ 10 mm3. These benchmark parts are to be manipulated using gripper tools mounted on two MiCRoN robots and the assembly is to be monitored by a micro-camera mounted on a larger mobile robot. The second demonstration is derived from the field of biological and biomedical nano-manipulation. This experiment is composed of three major steps. In the first step, one robot isolates and captures a single cell suspended in a solution using a glass micropipette. This robot transports the cell to a pre-defined position where it is released and fixed to the object slide by an electric field trap. At this position, the cell is held in

Plate 2. Two early MiCRoN robot prototypes manipulating a human hair

place for two procedures to be performed: the first one is an AFM measurement performed by a second robot while the second one involves the injection of a liquid into the cell by a third robot using a customised syringe chip. This demonstration is performed under an inverted optical microscope. In the first demonstration, a Miniman-IV robot will be equipped with a customised miniature CCD camera which will be used to gather local positioning information of the location of tools as well as the parts to be manipulated. The required precision depends largely on the task at hand; however, an imaging resolution of 1 mm/pixel is needed to meet the requirements of the planned demonstration. The specification of the system and its field of application in the micro- and nano-range impose very limiting constraints to the selection of the optical system. To achieve a pixel resolution of 1 mm/pixel with currently available CCD or CMOS imaging chips, an optical system with a magnification of 6-10 £ is required. Although microscope objectives and high resolution CCD and CMOS imaging chips are largely available in the market, the existing limits in size (5-6 cm in length) and weight (less than 100 g) severely restrict the choice. A Panasonic GP-CX261 1/4 in. CCD camera head coupled with a custom designed lens system that provides a magnification of 5 £ was selected for the task. The camera head and the lens were assembled into a robot-compatible housing (Figure 2). A three-DOF translation stage with micron-level accuracy was also used to aid the testing of the system.

Machine vision methods

1425

3. The vision system One of the problems faced with miniature lens systems is the amount of barrel distortion present in the image. This is typified by Figure 3(a). This can be corrected using a variety of remapping or warping techniques. Given that (x, y) are the coordinates of a pixel in the image, in Figure 3(b) the correction is performed by mapping (x, y) to (x0 , y0 ) using the following equation:

Figure 2. A custom built camera system with an integrated microscope lens system

K 34,9/10

1426

Figure 3. Correcting for the effects of barrelling.

x0 y0

!

0 12cr ¼@

12c

0

0

12cr 12c

1 A

x y

! ð1Þ

pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi where r ¼ x 2 þ y 2 is the radius from the centre pixel of the image and c is a curvature parameter. The process of selecting c can be semi-automated by using a grid image as shown in Figure 3. The value of c is varied and at each stage the Hough transform is observed. Owing to the nature of the Hough transform, sharper peaks will be generated in the transform if the lines in the image are straighter. Hence, an optimal value of c that corrects the distortion can be found by numerically selecting the c value that maximises the straightness of lines on the grid image. In the instance of Figure 3, a value of c ¼ 0:18 was obtained. One of the main requirements for the vision algorithms that are being designed for MiCRoN is real-time performance. The vision system has to be synchronised with the control system at all times. The micro-robotic system is a hard real-time system, so maximum delays and response times for every stage of processing have to be defined. We estimate an initial processing rate of no more than 10 fps in order to achieve the requirements of the project deliverables. The control system has to react, respond and alter its parameters according to this constraint. In this section, we shall introduce some newly designed vision algorithms and methods. The vision system itself is built over an existing vision toolkit, developed in-house, known as Mimas[3]. The new vision algorithms that will be discussed in the following subsections include: . a robust tracking method with in-built error recovery, . a neural network paradigm for colour segmentation, . a fast shape-based scale-invariant recognition algorithm, and . a customised method for locating living cells. 3.1 Robust object tracking One of the planned tasks in MiCRoN entails guiding a glass micropipette, with a tip diameter of 30 mm, to transport cell samples and position them on a statically charged cell holder. The task is to be carried out semi-autonomously through closed-loop control. The main problems encountered are:

. .

the translucent glass tip image is affected by the background clutter, and the body of the pipette and the grid on the cell holders have very similar line features making it difficult for recognition methods to differentiate between the two different objects.

The problem was approached by making use of the statistical method based on conditional probability density estimation, known as the condensation (Isard and Blake, 1998) algorithm or, more commonly, as particle filtering. The initial observation measure for the Condensation algorithm was obtained through normalised correlation between the model (pipette tip) image and the scene image. Condensation is a filtering algorithm developed using a probabilistic density function (pdf) to model the probable locations of objects. One of its unique features is that no functional assumptions (e.g. Gaussianity or unimodality) are made about the underlying pdf. In a cluttered scene environment, the probability of locating an object in the scene gives rise to a multi-modal pdf. This is due to the fact that quite similar objects in the background produce probability measures that are comparable to the object itself. Therefore, these probability measures are of similar value to the probability measure at the actual location of the object itself. Since condensation can handle multi-modal pdfs, it is well suited for cluttered scene evaluation. The Condensation filter belongs to the class of Bayesian filters. This implies that it is based on sampling the posterior distribution estimated in the previous frame[4] and propagating these samples or particles to form the prior distribution for the current frame[5]. The key idea of particle filtering is to approximate the probability distribution by particles which are weighted samples of the current state. A particle p can be described as an element of P where pl;w [ P;

where ðl; wÞ [ Rn £ Rþ ; n [ N*

ð2Þ

We introduce the following notation: . pl refers to the features value for the particle p; . pl i the particle p’s ith feature; . pw the particle’s weight; and . p (n) is the nth particle. The particle features are correlated with those of the tracked object in the given scene image. The spatial location features of the particle give the predicted location of the object that is being tracked. Thus, the particles can be seen as the hypothetical state of the object being tracked and this hypothesis is quantified by the measure associated with it (edge and correlation measure). This measure is then integrated into the weight of the particle. The higher the weight, the more probable it is that the particle describes the state of the tracked object. To predict the features of the particles, earlier measures on the previous state and the system model are taken into account. In the tracking scenario, the two features of a particle are the x and y location of the tracked object. The weight of each particle could be due to the result of the correlation of the template image and the part of the image centered on the particle.

Machine vision methods

1427

K 34,9/10

1428

In the tracking of the pipette, pure correlation of the grey scale template image produced very poor results. The results were improved by filtering the image with an edge filter and correlating it with an edge template. This procedure may be considered equivalent to shape matching. The evolution of the particle set is described by propagating each sample according to the kinematics of the object. The position L of the object is estimated at each time step by the weighted mean of the particles: N X



ðnÞ pðnÞ l £ pw

n¼1 N X

ð3Þ pðnÞ w

n¼1

Through this, particle filtering provides a robust tracking framework, as it models uncertainty. It is able to consider multiple state hypotheses simultaneously. Since less likely object states have the opportunity to temporarily influence the tracking process, particle filters can deal with short-lived occlusions (Nummiaro et al., 2002). Figure 4 is a simplified flow chart that sums up the steps involved. Plate 3 and Figure 5 show the results of implementing the particle filter on an image sequence that contains a translucent micropipette. Owing to the problems of ill-defined features and background clutter, the above approach to tracking using pure condensation was not fully successful. Further issues that needed to be addressed were: . features of the translucent pipette tip were not well defined and changed according to the incident light angle; and

Figure 4. Particle filtering stages

Machine vision methods

1429

Plate 3. Top left corner is the template image. Small black dots are the particles.

Figure 5. The weights measure for this graph and subsequent graphs represent edge correlation measure.

.

objects from the background, e.g. the cell holder, became visible through the translucent pipette tip.

To address these issues, a two-layered decision process was designed, with a feedback mechanism between the second layer and the first. The first layer decides upon the probable locations of the object based on the information from the previous frame. This is very similar to classic particle filtering, as described earlier. The second layer further filters those locations to arrive upon a more accurate list of possible locations of the object. This second filtering stage makes use of the information in the current frame.

K 34,9/10

1430

For the next iteration, the probable locations from the second filtering stage is fed back into the first filter. To ensure robust tracking, the issues of feature corruption and background clutter were addressed as follows: . feature corruption: a multi-variate feature measure was developed by fusing edge features with grey scale value-based features. . background clutter: a distance penalisation measure was incorporated while evaluating the probable location of the objects. Plate 4 and Figure 6 show the new results of implementing our modified version of the particle filter algorithm on an image sequence that contains a translucent micropipette crossing some non-translucent electrodes. As can be seen, in spite of background clutter, the pipette tip is localised correctly. To address the issue of real-time processing, it was assumed that there is only one instance of the object present in the scene. Distractions owing to background clutter were further reduced by limiting the probable object locations within the vicinity of the previously tracked location. To implement this, the particle locations (i.e. probable locations) are re-initialised to be in the vicinity of the tracked location of the object. Re-initialisation of the probable locations is performed only for those frames which give a very high probability value for the probable tracked location. 3.2 Colour segmentation Although segmentation by colour is not a new field of investigation, the use of colour cues has traditionally not been robust. This is mainly due to the three individual RGB components that can vary almost independently depending on the brightness and colour of the light that falls onto the surface that is being imaged. In addition, many existing segmenters are based on the principle of look-up-tables. This contributes to

Plate 4. In spite of indistinctive features of the pipette, its location is found.

Machine vision methods

1431

Figure 6.

the inflexibility of the segmenter to either interpolate or extrapolate information. Furthermore, these methods produce a solution outside the statistical framework making it difficult to fuse into existing systems (Cı´ger and Pla˘cek, 2001). For solutions that fall within a statistical framework, methods based on unsupervised k-means clustering are preferable (Lucchese and Mitra, 1999). Many of these methods tend to make use of unique colour spaces, the most common of which is the CIE L *a *b * colour space which is designed to encompass all colours that a human eye can see. This colour space is represented by three variables, namely luminance and the colour values on the red-green and blue-yellow axis. When applied to real-world situations, problems with these methods exist: . clustering is an iterative process and so it is computationally expensive; . unsupervised clustering is non-deterministic, so it may fail, and/or the time required to produce a result cannot be estimated; . the use of unsupervised clustering leads to a non-unique index for the object of interest, so further human interpretation is required to identify the index that represents the object of interest; and . conversion between colour spaces adds to the computational cost. A method for segmenting images using a committee of multilayered-perceptron (MLP) type neural networks has been developed. Each network takes the form of a 3-12-1 configuration, i.e. consisting of three units on the input layer, 12 units on the single hidden layer and one unit on the output layer. The number of inputs equals the number of components of the RGB triplet. The transfer functions for the hidden and the output

K 34,9/10

layers are hyperbolic tangents. The network essentially takes the following form:

y¼f

n X j¼1

1432

Figure 7. Construction of training set samples using k-means CIE L *a *b * colour space clustering

wout j f

m X

!! whid ji xi þ bhid

! þ b out

ð4Þ

i¼1

where y is the output of the network, x are the inputs to the network, f( · ) is the tanh transfer function, w out and w hid are weights for the output and the hidden layers, b hid ¼ b out ¼ 21 are the biases, m ¼ 3 is the number of input units and, finally, n ¼ 12 is the number of hidden units. The training dataset consists of colour examples from real images in the work environment. A real image is partitioned into colours that are to be recognised and those that are to be ignored. The training set is constructed with the aid of a k-means CIE L *a *b * algorithm (Woelker, 1996). A sample result following this clustering step is shown in Figure 7. Here, the third index has been manually identified to be the one that represents the object of interest. The RGB components of the object of interest and the background are extracted for the training set, and a target training value is assigned. By observing the sensitivity of the trained network, other networks can be trained to bolster the overall sensitivity. The brightness and intensity of these images are varied and these are used to train new networks. The optimisation algorithm used on equation (4) is QuickProp (Fahlman, 1988) with a training rate of h ¼ 1 £ 1023 and dmax ¼ 0:1: The outputs from the multiple networks are combined with a winner-takes-all layer which decides on the overall output of the network. In order to make use of this committee of networks, the networks are combined into a blob analysis paradigm, so that the object boundary can be determined. The outputs from the networks are essentially a probabilistic measure so these values are first prior-debiased so that they can be thresholded at the midpoint, i.e. at value 0.5. This means that we can allocate equal probability of whether a pixel constitutes the object of interest or not. These thresholded values are then fed into a blob finder algorithm, which is, in essence, a recursive implementation of connected component analysis. Subsequently, small blobs are removed as these are assumed to be mainly due to either mis-classification or noise. The larger recognised blobs can be tracked using a variety of tracking methods. Three instances of the results obtained are shown in Plate 5.

The images used in Plate 5 are derived from a real image sequence which consists of a moving pair of micro-grippers. Using shape information only to locate the micro-grippers is difficult and time-consuming. Shapes are derived from features such as edges and corners. The highly textural gel-pack in the background contributes to a large number of unwanted features that will either skew the results of the recognition process or dramatically increase the search (recognition) time. The traditional way of designing a vision algorithm to recognise and track the micro-grippers is to use a distinctive set of markers that can be correlated against a reference set. However, this is not ideal if the objective is to design a generic solution. The micro-grippers were intentionally sputtered with a thin layer of gold in order to improve their visibility in conditions of low light. The gold layer also causes the colour of the micro-grippers to be distinctly different from the gel-pack and the micro-lenses in the scene. Hence, the use of colour as a cue for segmentation is ideal. On a 3.0 GHz Pentium 4 PC, it took approximately 9.76 s for the k-means clustering algorithm to converge and to produce clusters that have to be subsequently interpreted manually. On the other hand, our neural network segmenter is able to segment and recognise the micro-grippers in approximately 700 ms which is an order of magnitude faster. Owing to the fact that a trained neural network essentially consists of multipliers and adders, the algorithm also has the potential of being implemented directly onto hardware FPGAs.

Machine vision methods

1433

3.3 Shape recognition To build a robust vision system, some types of generic shape recognition method are required. Most methods for shape recognition are either too slow to operate in real-time or do not fulfill objectives for invariance. The pairs-of-lines (POL) (Meikle et al., 2004) method was developed specifically to address these concerns. The POL method of recognition is invariant to translation, rotation and scale. Scale invariance is especially important for the vision system to be used in a 3D environment. The recognition scheme is shown in Figure 8. First, edge strings are extracted from both a scene image and the model image using Canny’s algorithm (Canny, 1986) with post processing. The strings are then transformed into a series of straight lines, using Ballard’s (1981) straight-line recursive split algorithm. These straight lines approximate the original edge strings. Next, a search is made for pairs of lines in the scene which could potentially match pairs of lines in the model, allowing for occlusion, scale and orientation changes in the model. Pairs which match enable a computation of the possible position and orientation of the

Plate 5. Identification of object of interest using the neural network segmenter and blob analysis.

K 34,9/10

1434

Figure 8. Using pairs of lines for fast real-time recognition

model in the scene to be made. Results from the comparison of all possible line pairs are collated in a 2D histogram. Finally, results are extracted from the histogram and translated back to the input scene domain. This yields estimates of the position, scale and orientation of the model in the scene. Multiple instances of the same model in the scene can also be found. In order to obtain scale invariance, consider the image of Figure 9 which shows a comparison between two pairs of lines which share a similar intersection angle. For clarity, the pairs of lines have been shown in a similar orientation, however, this is not generally the case during recognition.

Figure 9.

The task is to compute the range of possible centroid positions of the model in the scene, given that these two pairs of lines could match. There are errors inherent in the measurement of the position of the edges in the scene. Let this be labelled E. If it is considered that Ma1 is to map to Sa1 and Ma2 is to map to Sa2 and so on, the following expressions for the possible scale of the model in the scene can be produced: min1 ¼

S b1 M b1 þ E

ð5Þ

min2 ¼

S b2 M b2 þ E

ð6Þ

max1 ¼

S a1 M a1 2 E

ð7Þ

max2 ¼

S a2 M a2 2 E

ð8Þ

Machine vision methods

1435

Here, min1 and max1 are the minimum and maximum scale changes which are consistent with the observed lines, if only Ma1 and Mb1 are considered as mapping to Sa1 and Sb1. Similarly, if we consider that Ma2 and Mb2 map to Sa2 and Sb2 then we arrive at min2 and max2 being the bounding scale changes. As the lines from the scene and model have been considered individually in this last calculation, the algorithm must compute the range of scales which is consistent with both lines. This is simply the values which are common to both scale changes 2 min2 1 max1 and min2 max2. Usually this means that the larger of (min1, min2) is the lowest consistent scale and the lower of (max1, max2) is the largest consistent scale. Referring back to Figure 9, if there are any model scales consistent with both lines then the range of possible model centroid positions in the scene can be determined. The results of this scale invariant method are shown in Figure 10. Notice that the size of the cogwheels in the scene image on the right is varied. The white lines in the scene image show the boundary of the objects that have been recognised. In the case of the cogwheel on the lower left, the orientation has not been correctly detected but even then the location of the centroid has been correctly identified.

Figure 10. Model of cogwheel (left) and scene image with multiple cogwheels of varied size (right)

K 34,9/10

1436

Figure 11. Finding live cell clusters using morphological operators

In order to achieve real-time performance, the search method can be optimised. This can either be done algorithmically, i.e. by implementing a more efficient and deterministic search routine, or through parallelisation. 3.4 An adhoc method for cell identification In instances where generic vision methods are not viable, customised methods have been designed to perform specific tasks. An example is the demonstration that requires the micro-robot to work on a cluster of live cells. The entire imaging process developed is shown in Figure 11. It is largely based on methods developed in the field of image morphology.

The boundaries of the live cells first have to be localised. This is done by first injecting diacetyl-fluorescence into the aqueous solution that contains the cells. This causes only live cells to be visible under fluorescent lighting. The image is then automatically thresholded using Otsu’s optimal thresholder (Otsu, 1979). The method is based on using the grey-level histogram of an image and then selecting the lowest point between two peaks where each peak represents a separate class. The criterion function involves minimising the ratio of the between-class variance to the total variance. The thresholded image produces clusters of cells that can be labelled. In order to do so, connected component analysis is performed. The following morphological equation summarises the process I k ¼ ðI k21 %BÞ > C

ð9Þ

where k ¼ 1; 2; 3; . . . represents the pixel count in the image, Ik is the k-th pixel in the image I, B is a morphological structuring element, C is the present cluster that the result of the morphological dilation (given by %) intersects with. The areas of the clusters detected above are then computed. An area filter is then applied to filter out clusters smaller than a pre-specified area as these fall into the category of either false positives or general noise. The boundary of the clusters is required for the approach of the micro-robot. In order to identify the boundaries in real-time, a fast and well-known method for identifying boundaries using morphological operators is used. It is essentially written as a two-stage process as follows (in mathematical morphology notation): E ¼ I *B bðI Þ ¼ I 2 E

ð10Þ

where E is the erosion process, I the original image and b(I) the extracted boundary itself. In the case of E, the equation above is equivalent to an image I being eroded by the structuring element B and so can be written as an intersection of all translations of the image I by the vector 2 b [ B such that: I *B ¼

\

I 2b

ð11Þ

b[B

In our case, the structuring element, B, is simply a 3 £ 3 matrix of ones. Now that the locations of the live cells and their boundaries have been identified, the cells can be labelled, as shown in the final image in Figure 10. 4. Current results The MiCRoN project is still under way and the specifications of the demonstrations illustrated in Section 2 may still be subject to minor hardware modifications. The algorithms presented in this paper were initially designed in an early stage of the project, when both demonstrations were still being defined in detail. For this reason, whilst the algorithms address the main issues that will be faced in the final set-ups, they may still be subject to alterations or fine-tuning.

Machine vision methods

1437

K 34,9/10

1438

The results obtained with the current algorithms are presented within each subsection. They have been tested using image sets which are as close as possible to the operating environments in the final set-ups. The robust object tracking in subsection 3.1 will be used in the biological demonstration for tracking the glass pipette in its approach to the cell. This will be combined with the adhoc method shown in subsection 3.4 for the recognition of live cells. A third algorithm for the recognition/tracking of the customised syringe chip is still under development. For the assembly demonstration (soldering task), either the colour segmentation method in subsection 3.2 or the shape recognition algorithm in subsection 3.3 will be employed for the recognition/tracking of the robot-mounted micro-grippers. The choice will largely depend on the image quality of the micro-camera-based acquisition system and the amount of clutter present in the workspace where the soldering procedure is to take place. We are also investigating alternative methods that may be used with the micro-parts to be soldered in the case the ones presented in this paper do not seem to meet all the needed requirements. 5. Conclusions When several robots with different tools have to work in the same work environment, issues related to co-operation arise. In order to co-ordinate movement, vision systems play a very important role in providing feedback. This is even more important in the micro-world where it is difficult to develop and mount sensors directly on the robots themselves, mainly due to physical limitations. As the need for micro-manufacturing increases, there will also be an increasing need for technologies that will enable operation automation. Methods commonly used in the macro-world may not be as robust when applied to the micro-world. Hence, new technologies and methods need to be researched and developed. In this paper we have presented a number of hardware and software issues, which arise when moving the application domain of robots from the macro- to the micro-world. Several new and original algorithms used for calibration, recognition and tracking have been developed and presented, with results accompanying each instance. We have adopted various techniques developed in the field of modern artificial intelligence, namely neural networks, particle filtering and morphology. More importantly, we anticipate that the methods we have developed will support the wider context of making robots autonomous. This fits in very nicely with an early quote by John McCarthy in 1956 on the definition of artificial intelligence: “making a machine behave in ways that would be called intelligent if a human were so behaving”. Notes 1. IST Project No. IST-2001-33567, www.shu.ac.uk/mmvl/micron/ 2. ESPRIT 4 Project No. 33915, www.shu.ac.uk/mmvl/miniman/ 3. www.shu.ac.uk/mmvl/mimas/ 4. p(Xt2 1jZt2 1) is the probability of states up to time t 2 1 given all the measurements to time t 2 1. Where xt is the state of the modelled object, Xt ¼ {x1, . . . ,xt} its history and similarly for zt the set of image features. 5. p(xtjZt2 1) is the probability of states at time t given all the measurements zi to time t 2 1.

References Ballard, D.H. (1981), “Strip trees: a hierarchical representation for curves”, Communications of the ACM, Vol. 24 No. 5, pp. 310-21. Bu¨erkle, A., Schmoeckel, F., Kiefer, M., Amavasai, B.P., Caparrelli, F., Selvan, A.N. and Travis, J.R. (2001), “Vision based closed-loop control of mobile microrobots for micro handling tasks”, Proceedings of SPIE Vol. 4568: Microrobotics and Microassembly III, Boston, MA, October, pp. 187-98. Canny, J. (1986), “A computational approach to edge detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 8 No. 6, pp. 679-98. Cı´ger, C. and Pla˘cek, J. (2001), “Non-traditional image segmentation and filtering”, paper presented at 17th Spring Conf. on Comp. Graphics, April, pp. 25-8. Fahlman, S.E. (1988), “Faster-learning variations on back-propagation: an empirical study”, in Touretzky, D., Hinton, G. and Sejnowski, T. (Eds), Proceedings of the 1988 Connectionist Models Summer School, Morgan Kaufmann, pp. 38-51. Isard, M. and Blake, A. (1998), “Condensation – conditional density propagation for visual tracking”, Int. J. Computer Vision, Vol. 29 No. 1, pp. 5-28. Lucchese, L. and Mitra, S.K. (1999), “Unsupervised segmentation of color images based on k-means clustering in the chromaticity plane”, Proceedings of IEEE Workshop on Content-based Access of Image and Video Libraries (CBAIVL’99), Fort Collins, pp. 74-8. Meikle, S., Amavasai, B.P. and Caparrelli, F. (2004), “Towards real-time object recognition using pairs of lines”, in preparation. Nummiaro, K., Koller-Meier, E. and Van Gool, L.J. (2002), “Object tracking with an adaptive color-based particle filter”, paper presented at DAGM Symposium, pp. 353-60. Otsu, N. (1979), “A threshold selection method from gray-level histograms”, IEEE Trans. Syst., Man, Cybern., Vol. 9 No. 1, pp. 62-6. Woelker, W. (1996), “Image segmentation based on an adaptive 3D analysis of the CIE-L *a *b * color space”, Proceedings of SPIE ’96 – Visual Communications and Image Processing ’96, Vol. 2727, pp. 1197-203. Further reading Amavasai, B.P., Caparrelli, F., Selvan, A.N., Meikle, S. and Travis, J.R. (2003), “Control of a cluster of miniature robots using cybernetic vision”, IEEE SMC Chapter Conference on Cybernetics Intelligence – Challenges and Advances, Reading, September, pp. 80-6. Gonzalez, R.C. and Woods, R.E. (1993), Digital Image Processing, Addison-Wesley, Reading, MA. Selvan, A.N., Boissenin, M., Amavasai, B.P., Caparrelli, F. and Travis, J.R. (2003), “Tracking translucent objects in cluttered scenes”, paper presented at IEEE SMC Chapter Conference on Cybernetics Intelligence – Challenges and Advances, Reading, MA, pp. 110-8.

Machine vision methods

1439

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

K 34,9/10

Optimisation enhancement using self-organising fuzzy control

1440

Department of Information Technology, National University of Ireland, Galway, Ireland

Ann Tighe, Finlay S. Smith and Gerard Lyons

Abstract Purpose – To show the successful use of self-organising fuzzy control in enhancing dynamic optimisation, a controller is used to direct the type of optimisation appropriate in each new dynamic problem. The system uses its experiences to determine which approach is most suitable under varying circumstances. Design/methodology/approach – A knowledge extraction tool is used to gain basic information about the solution space with a simple computation. This information is compared with the fuzzy rules stored in the system. These rules hold a collection of facts on previous successes and failures, which were acquired through the performance monitor. Using this system the controller directs the algorithms, deciphering the most appropriate strategy for the current problem. Research limitations/implications – This procedure is designed for large scale dynamic optimisation problems, where a portion of the computational time is sacrificed to allow the controller to direct the best possible solution strategy. The results here are based on smaller scale systems, which illustrate the benefits of the technique. Findings – The results highlight two significant aspects. From the comparison of the three algorithms without the use of the controller, a pattern can be seen in how the algorithms perform on different types of problems. Results show an improvement in the overall quality when the controller is employed. Originality/value – This paper introduces a novel approach to the problem dynamic optimisation. It combines the control ability of self-organising fuzzy logic with a range of optimisation techniques to obtain the best possible approach in any one situation. Keywords Cybernetics, Optimization techniques, Fuzzy control Paper type Research paper

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1440-1455 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614759

1. Introduction One fact all global cities will have in common for many generations to come is rising traffic congestion. Great emphasis has been placed on traffic congestion and carbon dioxide emissions in many research areas from supply chain management to fleet management. Furthermore, the investigation of real time cooperation between delivery companies to get the most efficient use of vehicle capacity and driver hours has led to the need for a dynamic optimisation system that can perform satisfactorily on this changing problem type. The optimisation system discussed here forms part of the V-LAB (virtual logistics multi-agent broker) project, whose objective is to develop a system to allow the cooperation of haulage companies to facilitate extra jobs convenient to their schedule (Tighe and Lyons, 2003a, b). The motivation is two fold, one to reduce the number of empty or half full lorries from the roads and secondly to find the most economical price This work is funded through Enterprise Ireland’s Advanced Technologies Research Programme.

for the transportation of goods. V-LAB uses a broker to auction available jobs. The optimisation system calculates a bid for a particular job by obtaining a cost for the addition of that job to the schedule. Owing to the finite amount of computational time, and the need for an efficient use of this time, only the most cost-effective jobs will be selected for bid preparation. This paper includes work in resolving how this decision would be made. The V-LAB system is built on an agent-to-agent framework which will allow vehicles to respond to jobs while on the move, thereby increasing the efficiency but also leading to the need for a rapid real time optimisation system. Self-organising fuzzy control has become an efficient and successful alternative for complex, poorly defined systems. Here an adaptive controller is used to decide which optimisation algorithm would be most appropriate to run in different situations. Most real world optimisation problems are continuously changing, for example, adding new stops to a schedule or an additional constraint. Numerous different strategies for optimisation have been explored, such as genetic algorithms (GAs) (Goldberg, 1989), tabu search (TS) (Glover and Greenberg, 1989). Solomon’s benchmark problems for vehicle routing shows how different algorithms perform better on different versions of the problem and that no one algorithm has superiority (Zhu et al., 2000). Other research known as the “no free lunch theorems” also shows that no single algorithm is better at all possible instances of an optimisation problem (Wolpert and Macready, 1997). This supports the idea that a tailor-made metaheuristic for each problem has the greatest likelihood of performing optimally. The controller prioritises the algorithms, with respect to the information gained from the knowledge extraction tool, for the particular problem. The fuzzy rules stored in the system, hold a collection of facts on previous performances, which were acquired through the performance monitor. The optimisation problem consists of adding an additional set of constraints to an otherwise optimised combinatorial optimisation problem. The optimised problem stays the same, but the complexity of the addition constraints varies dramatically between the 52 problems in the test set. The experimentation is carried out by comparing a simulation of solving the test set with and without the controller. All aspects of the two simulations are equivalent except for a percentage of the optimisation time used by the controller. Section 2 gives some background to the idea of fuzzy logic and self-organising fuzzy control. Section 3 gives an introduction to search algorithms and details a number of them. The vehicle routing problem (VRP) is presented in Section 4. In Section 5 the design of the controller is discussed. Following that, Section 6 describes the controller simulations used to generate results and these results are analysed in Section 7. Finally, conclusions are discussed in Section 8. 2. Fuzzy logic Zadeh introduced the idea of fuzzy set theory in 1965, as a procedure for dealing with the ambiguity of many real world problems (Zadeh, 1965). He extended the ideas of set theory to include structures for tackling these imprecisions, known as linguistic variables. Mamdani used these linguistic variables to create the first fuzzy logic controller (Mamdani and Assilian, 1975). This format for fuzzy logic control remains one of the two major types of fuzzy control, the other type was developed by Takagi and Sugeno (1985). The main difference between these implementations is the consequents of the rules, Mamdani used a fuzzy set where as Takagi and Sugeno all

Optimisation enhancement

1441

K 34,9/10

1442

consequents are represented by a linear numbers. The Mamdani controller is preferred for knowledge processing expert systems where as the Takagi and Sugeno is favoured in dynamic nonlinear system. The general fuzzy logic process, can be divided into steps. Normalisation transforms the data into the scale necessary to be interpreted by the fuzzy rules. Fuzzification is used to determine the degree to which the inputs belong to each of the fuzzy sets. These are then applied to the antecedents of the rules in the rule base and the collaborated output is formed by the matching rules, weighted according to how well they match the inputs. Defuzzification converts the collective output fuzzy set to a number. Finally denormalisation may be necessary to adjust the output to the suitable range. 2.1 Self-organising fuzzy control Self-organising fuzzy control is an extension on the standard fuzzy control by modifying the existing rules, or in some cases developing a rule base from a position where there is no knowledge of the system before hand. Procyk and Mamdani (1979) introduced the idea of utilising performance evaluation to generate and modify the rule base. This type of adaptive controller can be separated into two parts, the system monitor (which recognizes when the controller needs to make changes to the system) and the adaptive techniques. GAs have been used for the design and optimisation of the rule base to maximise the accuracy of the system (Cordon and Herrera, 1997). In recent times, the use of neural networks to enhance fuzzy control has shown considerable success (Nunes et al., 2002). When adapting the rules, a controller determines which rules are causing inaccurate control of the system. These rules are modified by changing the shape of the membership function, and removing the part causing the imprecise output (Smith, 2001). Figure 1 shows the original fuzzy subset (a) which represents one of the arguments of the fuzzy rule that is causing an error. The true value of this argument when the error occurred is shown in (b). The modification procedure corrects the original fuzzy subset by replacing it with the minimum, of the original fuzzy subset and the compliment of the real value as seen in (c). This changes the stored fuzzy subset to (e) from its original form in (a). The degree to which the rules are modified relates to the amount the solution differs from the decision produced by the rule base. This modification to the rules is in addition to the true version of the rule being added. 3. Search algorithms A great variety of search algorithms are available for optimisation problems, each have some advantages for particular applications. Metaheuristic algorithms such as guided local search or TS (Glover and Greenberg, 1989) use “rules-of-thumb” to seek out better solutions in neighbourhoods where good solutions have previously been found, as well as heuristics to avoid near optimal solution. More robust metaheuristic algorithms such as GAs and simulated annealing use an additional random approach to avoid getting caught in a suboptimal neighbourhoods. Their sensitivity is not as great because they contain this extended random aspect but their performance is enhanced by making it less difficult to find global optimum. Optimal methods, such as branch and bound (BB), are uncompromising search tools that fully consider every possible

Optimisation enhancement

1443

Figure 1. Rule modification procedure

sequence. Their results are mathematically exact, but they are computationally expensive. Research has shown that no one algorithm performs satisfactorily over all optimisation problems (Wolpert and Macready, 1997), how much an algorithm accomplishes is related to its suitability to the specific problem. With dynamic optimisation problems, the type of problem can change as it grows or reduces. The size and the amount of constraints will effect the success of each particular algorithm. 3.1 Tabu search (TS) TS aims to help a local search strategy escape local optimums (Glover and Greenberg, 1989). TS uses a history to selectively score earlier moves and their status. A special version of the TS was developed for a particular combinatorial optimisation problem, this method was named strict tabu. It combines short-term recent memory with long-term frequent memory (Zhu et al., 2000). This shows how custom-made search algorithms can increase the performance. Another improvement in the memory system of the TS is the parallel algorithm which is implemented on a distributed memory multiprocessor system to accelerate the TS process (Schulze and Fahle, 1999). The algorithm performs several search threads in parallel, each thread starts with a different initial solution and tries to improve on that solution. After applying these local optimisation steps the worst solution is replaced by a new one.

K 34,9/10

1444

3.2 Genetic algorithms GAs uses a biological system to randomly search the solution space (Goldberg, 1989; Hart et al., 1999). This metaheuristic algorithm is powerful but crude, balancing efficiency with effectiveness. This survival of the fittest technique was taken from evolution, where nature chooses the strongest to reproduce in the hope each species will survive. This natural selection process combines the fittest population to create the new and superior population. As in biological evolution, these natural selection processes can be used to make an artificial system as effective as possible. The algorithm can be divided into three parts. Selection is the method of establishing which individuals are chosen for reproduction. A fit individual has a higher possibility of reproduction over less fit ones. Mutation allows for the reintroduction of features that may have been lost through evolution. It does this by randomly changing the characteristics of the individuals involved. Finally, crossover is a process of exchanging the genes between two individuals that have been reproduced. 3.3 Branch and bound (BB) BB is not a metaheuristic but an exhaustive search, which reduces the search complexity using a divide and conquer technique. It partitions the solution space into sub-problems and then solves each sub-problem recursively, while keeping track of the best solution found so far. The main idea of this technique is to prune areas of the search space, by calculating a bound on the quality of the solution. In scheduling problems, the fitness of the best solution is used as the bound, and when a partial solution is of a poorer quality than the best so far, the remainder of that search space is deemed unnecessary to examine. Even though this BB technique does restrict the size of the search, it can still leave an exponential number of possible answers. Therefore, resulting in an unfeasible solution strategy for many optimisation problems. 4. The vehicle routing problem (VRP) The VRP is an important management problem in the field of physical distribution and logistics. Dantzig and Ramser first formulated it in the 1950s (Dantzig and Ramser, 1959). This problem consists of a group of C n ¼ {c1 ; . . . ; cn } customers each wanting products from a depot. The depot has a fleet of k trucks; each truck starts at the depot and delivers products to a subset of the customers and then returning to their depot. Each truck must do so while fulfilling all constraints given by the customer, the driver and the geographical area and also optimising the total cost. With xij representing the cost of going from customer ci to cj, the minimizing equation takes the following from: cost ¼ MIN

X

xij

;ci ;cj [C

The optimisation of this equation is subject to all constraints in the system. The constraints given by customers are time windows in which deliveries must be made. Driver constraints would be the total driving time, driver rest periods or even language barriers if travelling to a foreign country. Geographical constraints on the other hand could be one-way streets, closed roads, shipping timetables or even road congestion. Each truck in the fleet will also have its own capacity which cannot be exceeded.

The most basic model is often referred to as capacitated vehicle routing problem (CVRP) (Goldberg, 1989), which assumes that all the vehicles are homogeneous with the same capability and set initially at the same depot and the customers have no specific time window, which means no definite collection or delivery times. A more complicated version of this is the vehicle routing problem with time windows (VRPTW), . Vehicle routing problem with time windows. In many delivery systems each customer may indicate, as well the load that has to be delivered to it, a period of time, called a time window, in which this delivery must take place. Here the objective is to find a route for each vehicle, where they leave the depot complete a subset of the deliveries without violating the vehicle capacity and time window and then return to the depot. This leaves a large amount of constraints to satisfy, that is the earliest and latest times each parcels can be collected or delivered. Much research has advanced this area of VRP by building on earlier algorithms such as the tabu method (Schulze and Fahle, 1999) and GAs (Zhu, 2000). . Multi-depot vehicle routing problem (MDVRP). This is where the system considers several depots in the computation. If the customers are bunched around the depots, then the system can be considered as a set of independent VRP’s but if not it can be considered as a MDVRP. MDVRP has to deal with the VRP decisions but it also entails the assignment of the customers to a certain depot (Wu et al., 2002). This is a more complicated system with a much larger dataset for optimisation. Considerably less work has been done on this problem. . Split delivery vehicle routing problem. This problem allows the customers to be serviced by more than one vehicle (Dror et al., 1994). This is very useful when the size of a customer’s cargo is larger than the capacity of the vehicle, or if the journey is long. Both of these can be dealt with by separating, the cargo into multiple cargos or the journeys into multiple journeys and dealing with them as separate problems. However, if it is for a considerably long journey, a lot of constraints should be placed on the cargo change over. For safety and security reasons a depot is required in the cargo changeover, also the time window should be sufficiently flexible. . Dynamic vehicle routing problem (DVRP). A dynamic version of the VRP is where new parcels can be excepted into the system at any time, and the system will process them (Zhu and Ong, 2000). These new jobs are incorporated into the optimisation on a constant basis. DVRP reflects the real-world more efficiently though it is considerably more complicated. Time becomes the crucial aspect in deciding what algorithm should be used to solve it. . Pickup and delivery vehicle routing problem (PDVRP). This extends the VRP by pairing the customers into pickups and deliveries (Bent and Hentenryck, 2003). This adds to the complexity of the problem. Now as well as each delivery having to abide by the time constraints and demand constraints, the pickup must be done before the delivery and they must be done by the same vehicle (Li and Lim, 2003). 5. The controller The controller assists the dynamic optimisation of a pickup and delivery VRPTW, where an additional parcel must be added to a previously optimised fleet. The controller

Optimisation enhancement

1445

K 34,9/10

1446

Figure 2. The structure of the self-organising fuzzy logic controller

needs two pieces of information to begin, the constraints set of the new parcel, and the fleet schedule with associated constraints set. This information is taken in by the knowledge extraction tool. Three inputs are created for the controller, each giving specific domain knowledge of the convenience and complexity of the particular problem. Once these are passed through the fuzzy logic controller an output priority is given for each of the algorithms. This priority indicates how successful the algorithm may be at this latest problem. The performance monitor can retain this information on success and failure and store it as new rules in the system. A self-organising process can edit the rule base where a particular rule shows a regular inconsistency by the performance monitor. Algorithms with a higher priority will be given a large amount of CPU time and those with a lower priority a proportionally lower percentage. This lets those strategies that sometimes produce a good answer quickly a chance to run and also helps gauge how successful each process is. Once the optimisation time is concluded, the best solution is returned. All results are then taken by the performance monitor and evaluated. This evaluation is used by the self-organising process to create new rules echoing the analysis. The following list describes the elements of the control system, (Figure 2). . Scheduling problem – The particular problem is first submitted to the system. . Knowledge extraction tool – This is used to evaluate the problem type and create inputs for the controller. See Section 5.1 . Normalisation – The inputs are normalised to correlate with the fuzzy sets used by the controller. . Fuzzy logic controller – Here a decision is created based on the information stored in the rule base. . Fuzzification – The implementation of the fuzzification process takes information from the input membership function and applies it to the inputs. . Rule base – The rule base consists of all the rules generated for the controller. See Section 5.3 . Defuzzification – This extracts the crisp values from the fuzzy set. . Denormalisation – The output is then denormalised to return it to the necessary value range. . Optimisation procedures – See Section 5.4

. .

Performance monitor – Monitors the successes of the controller. See Section 5.2 Self-organising – Adapts the rule base with respect to pervious performance. See Section 5.3

5.1 Domain knowledge extraction To decide which algorithm would be most suitable to a specific problem we must take a closer look at the problem itself. The process of extracting domain knowledge is an inherently difficult procedure (Jelasity, 2000, Beck and Fox, 2000). The object here is to gain basic information about the solution space, with a simple computation. Any extensive analysis would counteract the time saved in picking the most suitable algorithm. A dynamic optimisation problem offers a greater opportunity to demonstrate that information about the search space can be a considerable asset. Simple analysis could help in deciding the best way to deal with small changes to the optimisation problem. When a new pickup and delivery is added to an otherwise optimised route, a quick analysis can be done to see would a local search strategy be better at creating a new route or would a robust random approach be more appropriate. For the controller presented here, this analysis could be as simple as to check the nearest detour from the original route. In the case of a large system with many different metaheuristics, domain knowledge could extend to the size of the solution space, the amount and flexibility of constraints, the data type or the computational power available. The knowledge extraction tool focuses on three aspects of the new problem. Confining this information will help direct the most effective way of solving the problem. Once the information is retrieved it is passed to the controller where a decision is made on the best approach to take. Input 1: Detour: The first input involves the estimation of the detour necessary to add the new parcel to the schedule. This comprises of finding the nearest stop on the schedule to the new pick-up, calculating the associated detour and performing the same computation for the delivery location. Adding these together gives the total detour involved in a simple insertion of the new parcel to one vehicle’s route. This calculation is done for each vehicle in the fleet. This will assist the controller in choosing which vehicle is the most accessible to the new stops. Input 2: Convenience: The second calculation refers to the convenience of the new parcel to the schedule. Input 1, does not take into account the direction of the route, it only deals with the location. Input 2 finds the cost of the route segment between the two points on the route used to generate the detour. Then, it evaluates whether this insertion obeys the constraints, if it does, this cost is returned negative and if not it is returned positive. Input 3: Constraint flexibility: The third input to the system is the flexibility of the parcel. Here the constraints attached to the new pickup and delivery are investigated. A short-time constraint may cause problems for the GAs, but will help the TS strategy of confining the search space. These three values are normalised, so that they conform with the fuzzification procedure and are then used as the inputs to the controller. The controller receives a set of inputs of each vehicle and associated algorithm. The controller treats each set of inputs as an individual query, and compares the results from all the queries to make the final decision. The priority of each algorithm, and the vehicle that the new parcel should be added to, is returned from the controller.

Optimisation enhancement

1447

K 34,9/10

1448

5.2 Performance monitor The self-organising fuzzy logic control will adapt based on experience. The monitor compares the controller’s decision against what the algorithms actually achieved in the computational time given. It uses this information to decide when a new rule should be added or when the rules should be modified. The selection of a suitable algorithm is an uncertain process. An algorithm may sometimes produce a good quality result quickly but fail completely on other occasions. For this reason, the performance of any one task is divided proportionally between all algorithms available to the controller. The performance is evaluated over the set of all n algorithms, where qn, fn and pn represent the controllers priority given to each algorithm for optimisation, the resulting fitness value, and the subsequent performance generated for each algorithm by the performance monitor. Each new performance is calculated using the following equation. pj ¼

1 qj £ f j n X i¼1

1 qi £ f i

;

j ¼ 1; . . . ; n

Therefore the performance each algorithm is given is dependent on the priority given by the controller, the success of the algorithm and is proportional to that of the other algorithms running. This information can then be used to create new rules. Once the algorithms have completed their tasks, the performance monitor checks if this conforms with the control given. When a priority of a particular algorithm differs from that of the controllers priority, modifications are made to the rule base by removing the imprecision in the rules which is causing the difference in the decision and the result. 5.3 The rule base The controller starts with no rules in the rule base and will gradually learn how to control the process, adding rules as it gains knowledge. Each algorithm has a separate rule base for its performance, this gives more precise knowledge of how good or bad a result each algorithm achieves. As the controller gains experience, it adds to the rule base of each algorithm. These rules are created from the inputs to the controller and the subsequent performance of the algorithm. Modifications are made to the rule base when the controller’s performance is not satisfactory. The main problem with this approach is, it may lead to an enormous increase in the number of rules. The adaptive controller performs threshold checks before a new rule is added to the rule base. If the rule is not of a sufficient size, to effect the control of the process, it will not be added. This involves checking the new rule to see if its maximum height is greater than 10 percent of the maximum height of a rule. 5.4 The optimisation procedures Three search strategies were chosen for the control process to decide on. They include BB – an exhaustive procedure, TS – a metaheuristic algorithm based on local search and GA – a metaheuristic algorithm based on a random approach. The three algorithms are initialized from the original route, which is the best solution found for the standard PDVRP benchmarks, with the new pickup location inserted before the delivery location in the route of a particular vehicle. The particular vehicle route

that the locations are inserted in, is directed by the controller. Once the controller decides which vehicle shows the most promise of being able to fulfill the pickup and delivery of the parcel through the information given in the inputs, it passes this knowledge to each algorithm. Each algorithm is developed to exploit certain heuristics available to it by the problem definition. GA strategy: The GA uses a partially mapped crossover (PMX) (Zhu, 2000) to avoid duplicated stops on the route, and a single swap mutation. The size of the mapped section decreases with the generations. It focuses on the vehicle recommended by the controller to direct the search space. This is a robust strategy, that should normally produce a reasonable answer. TS strategy: The TS used here is designed as a metaheuristic that exploits the confined search space to increase the success. As well as taking the direction from the controller about which vehicle to confine the search to, the TS uses the time constraints to confine the search space even further. This limits the search space by removing sections where the time constraints would return an invalid fitness. It uses a nearest neighbour swap to move through the search space, and the tabu method to escape local optima. BB strategy: The BB inserts the new stops at the beginning of the search space, so as to give the best possible advantage to the algorithm. The reason for these particular algorithms is to give the system a variation of solution methods. At first, each method is given the same amount of computational power. As the system learns what each algorithm accomplishes in different situations it divides the computation power accordingly, so that the best possible use of the time available is given to the appropriate algorithms. 6. Controller simulations Solomon’s 56 problems are the most popular benchmarks for the VRP, and are based on real life freight systems (Solomon, 1987). The problems are divided into six groups, C1, C2, R1, R2, RC1 and RC2, each set containing eight to ten different problems. Sets C1 and C2 represent clustered geographical data, whereas the geographical data sets, R1 and R2, are randomly generated according to uniform distribution. Then RC1 and RC2 are a combination of both. The C1, R1, and RC1 problems have vehicles that have small capacities and short route time. These problems require between nine and nineteen vehicles. Problems C2, R2, and RC2 represent long-haul delivery with longer schedules, which require fewer vehicles. An extension of Solomon’s benchmarks was created by Li and Lim (2003) to illustrate the pickup and delivery version of the problem. This data set pairs Solomon’s data into pickups and deliveries, this is done at random. Consequently these problem contain the same contrasting problem specification as the original benchmarks, but with an extended constraint structure making them even more complex and challenging to solve. The controller is tested on an optimised pickup and delivery fleet where a new job is added. This entails two new stops to be incorporated into the schedule, with their associated time constraints. The new parcel added to the system may require a completely new route to be derived for one or more vehicles in the schedule, but in other cases it may be easily integrated into the existing route causing little or no disturbance to the original schedule.

Optimisation enhancement

1449

K 34,9/10

1450

The PDVRP benchmarks are used for each part of the simulation. Five problems from the data set are used. Problem lr204 is utilised as the previously optimised fleet. Lr207 represents the set of new parcels to be added. The other three problems lr205, lr208 and lr211 are used for training the controller. The only change to the benchmarks was the removal of the ten unit service time from the insert parcels and training data sets. The original schedule retains its service time. The motivation behind removing this service time, is to give the already tightly optimised schedule, a higher probability of including a new parcel. To date the best solution for the original problem lr204 is 849.05 units and occasionally this does not change. In that situation the stops conform completely with the original route and constraints. Other occasions the resulting cost of the schedule after integrating the new job can increase to 5,000 units. This dramatic increase in the cost is related to the 1,000 unit penalty for breaking one time or capacity constraint. The system involves adding each of the new parcels separately to the optimised route, and recalculating the fitness of the route each time. The fitness represents the cost to the fleet. Then the controller was trained on a new different set of problems, this involves a combination of three data sets, so as to give an overview of the type of problems. Finally the system was run again, this time the controller makes decisions on what algorithms should be used and which vehicles look the most promising to accept the new stops without breaking the constraints. Comparing the initial and final simulation shows the improvement in the results when using the controller. 7. Experimental results The simulation was run over the 52 parcel of problem lr207, they are run in an order starting with the parcels with the most flexible constraints and ending with parcels with very tight time windows. The results are illustrated through four diagrams. The first Figure 3 represents the priorities given by the controller for the set of parcels. The other three Figures 4-6 show a comparison between the results achieved with and without the controller for each of the three algorithms used. In Figure 3, the list of new jobs are plotted in order of complexity along the vertical axis, starting with the easiest jobs. The horizontal axis stores the priority value given

Figure 3. The priorities given for the 52 new jobs

Optimisation enhancement

1451

Figure 4. GA prioritised jobs: comparison of non-controlled versus controlled

Figure 5. TS prioritised jobs: comparison of non-controlled versus controlled

by the controller for each algorithm. Priorities above one show an increase, and those below one a decrease. For the first 20 parcels the GA shows a considerable edge over the other two algorithms, getting on average a priority of between 1.2 and 1.4. Then continuing through jobs 20-30, an increase in the difficulty of inserting a job with tighter constraints gains the TS at time a priory of over 1.4. The TS utilises the shorter search space to improve the fitness and focus on the most fruitful area in the neighborhood search. Over the last ten or so jobs the priorities change again, giving BB superiority. Although with these jobs all algorithms were unable to find solutions without breaking at least one constraint. Figures 4-6 show the contrast between results with and without the controller’s assistance. Figure 4 shows the cost of adding parcels where the was given a higher

K 34,9/10

1452

Figure 6. BB prioritised jobs: comparison of non-controlled versus controlled

priority. Looking at Figure 3 shows that most of the first 30 jobs receive a higher priority for the GA. Out of the 33 jobs with an increase in that priority, 23 of these show an as good as or better result. The line across the graph divided those results that improved with the controller (under the line), and those that did not (above the line). Focusing on results less than a 1,000 units shows the majority of the GA success involved jobs that were considerably convenient to the original schedule. Another important point to observe is that the GA can be very unpredictable, at times there is a 4,000 unit difference between the two results from the same problem. With TS Figure 5, the controller increases the priority on 21 of the jobs, with two thirds of them showing as good or better results with the controller. Also, less of a dramatic difference occurs between the two sets of results, with most results staying relative close to the dividing line. The spread of results as a lot wider compared to the GA, showing a more inconvenient job being successfully solved. Finally BB, Figure 6. Here the results are generally bad focusing around the 3,000 unit mark, for most of the 13 prioritised jobs. However the stability of the search strategy is obvious from the graph, as the results stay very close to the line at all times. From this comparison, an improvement in the overall results can be seen. Even though the controller uses up approximately 10 percent of time optimisation time on the controllers calculation, it successfully increases the performance. 8. Conclusions Presented here is a novel approach to the problem of dealing with dynamic optimisation. Our experiments show that certain search strategies have a predisposition for solving particular types of problems. The adaptive control system uses this fact for learning how different algorithms perform in different types of problems and utilises this information to establish a better solution to each individual problem. Results show that this technique can utilise domain knowledge to successfully select the more appropriate algorithms, even after a limited number of adaptations.

This approach, based on a self-organising controller could be extended by using a wider range of algorithms. Additional algorithms such as simulated annealing or ant colony optimisation as well as solutions for the constraint satisfaction problem (Miguel and Shen, 1999), would give a wider base for making decisions. Giving the controller authority to scale the operators of the metaheuristics, could extend the idea of tailor-made algorithms. This type of operator control may also work well with hybrid search algorithms. The system at present contains a lot of overheads, processing the same problem multiple times. Structures to prevent a considerable amount of these overheads would involve setting thresholds so that algorithms given a very low priority will not waste processing time. Also, domain knowledge could be used to decide when the problem has changed dramatically enough to invoke the controller, thereby only running multiple algorithms when large changes are necessary to the search strategy. The inputs used in the controller presented here, are specific for particular dynamic optimisation problems. Effective inputs could be used for deciding on optimisation problems that change dramatically through extensive additions to the problem. These inputs could include the size of the problem, type of data or the size and quantity of constraints. With any of these extensions the final goal is a range of compact algorithms, with control, which will easily adapt to different problems, hence enhancing the optimisation on any individual problem. The V-LAB system is an appropriate testing ground for this adaptive controller. Here a wide range of optimisation problems will be generated, from small problems with minor constraint, to large sophisticated problems with an abundance of constraints (Smith et al., 2002, 2003). The controller’s purpose is route optimisation within the fleet agent for the preparation of bids, in order to attempt to maximize truck and/or fleet utilisation. Each fleet has its own controller which designs a specific model for its particular constraint and optimisation needs, this gives every fleet the best competitive edge possible. The controller decides which techniques could be more successful, and as the data and constraints change in the dynamic system it will re-evaluate its decision. While also taking note of its success and failure, which may help in future decisions. Since V-LAB runs in real time, the controller is an efficient mechanism to decide on the trade off between computation time and quality of result. This is an excellent example of how the controller can be put to its ultimate use, building an expert decision-making system for solution strategies by information gained from an expanding search space and previous experience. References Beck, J. and Fox, M. (2000), “Dynamic problem structure analysis as a basis for constraint-directed scheduling heuristics”, Artificial Intelligence, Vol. 117 No. 1, pp. 31-81. Bent, R. and Hentenryck, P.V. (2003), “A two-stage hybrid algorithm for pickup and delivery vehicle routing problems with time windows”, Proceedings of the International Conference on Constraint Programming (CP-2003), pp. 123-37. Cordon, O. and Herrera, F. (1997), “Identification of linguistic fuzzy models by means of genetic algorithms”, in Driankov, H.H.D. (Ed.), Fuzzy Model Identification Selected Approaches, Springer, Paris, pp. 215-50. Dantzig, G. and Ramser, J. (1959), “The truck dispatching problem”, Management Science, Vol. 10, pp. 80-91.

Optimisation enhancement

1453

K 34,9/10

1454

Dror, M., Laporte, G. and Trudeau, P. (1994), “Vehicle routing with split deliveries”, Discrete Applied Mathematics, Vol. 50 No. 3, pp. 239-54. Glover, F. and Greenberg, H. (1989), “New approaches for heuristics search: a bilateral linkage with artificial intelligence”, European Journal of Operations Research, Vol. 39, pp. 119-30. Goldberg, D.E. (1989), Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley, Reading, MA. Hart, E., Ross, P. and Nelson, J. (1999), “Scheduling chicken catching – an investigation into the success of a genetic algorithm on a real world scheduling problem”, Baltzer Journals, pp. 363-80. Jelasity, M. (2000), Towards Automatic Domain Knowledge Extraction for Evolutionary Heuristics, Springer, Paris, Vol. 1917 of Lecture Notes in Computational Science, pp. 755-64. Li, H. and Lim, A. (2003), “A metaheuristic for the pickup and delivery problem with time windows”, International Journal on Artificial Intelligence Tools, Vol. 12 No. 2, pp. 173-86. Mamdani, E.H. and Assilian, S. (1975), “An experiment in linguistic synthesis with a fuzzy logic controller”, International Journal of Man-Machine Studies, pp. 1-13. Miguel, I. and Shen, Q. (1999), “Extending fcsp to support dynamically changing problems”, IEEE International Fuzzy Systems Conference Proceedings, pp. 22-5. Nunes, C., Mahfouf, M. and Linkens, D.A. (2002), “Neuro-fuzzy modelling in anaesthesia”, Proceedings of the Fifth Portuguese Conference on Automatic Control, pp. 501-6. Procyk, T.J. and Mamdani, E.H. (1979), “A linguistic self-organizing process controller”, Automica, pp. 15-30. Schulze, J. and Fahle, T. (1999), “A parallel algorithm for the vehicle routing problem with time window constraints”, Special Volume of Annals of Operations Research, Vol. 86, pp. 585-607. Smith, F.S. (2001), “Towards creating an adaptive intelligent card player”, Proceedings of the 1st UK Workshop on Computational Intelligence, UKCI’ 01, pp. 144-9. Smith, F.S., Tighe, A. and Lyons, G. (2002), “Towards a mulit-algorithm vehicle routing problem solver”, Proceedings of the 2nd UK Workshop on Computational Intelligence, UKCI’02, pp. 177-84. Smith, F.S., Tighe, A. and Lyons, G. (2003), “Self-organising fuzzy controller for optimisation in a dynamic environment”, Proceedings of the 3rd UK Workshop on Computational Intelligence, UKCI’03, pp. 272-80. Solomon, M.M. (1987), “Algorithms for the vehicle routing and scheduling problems with time window constraints”, Operation Research, No. 35, pp. 254-65. Takagi, T. and Sugeno, M. (1985), “Fuzzy identification of systems and its applications to modelling and control”, IEEE Transactions on Systems, Man, and Cybernetics, pp. 116-32. Tighe, A., Smith, F.S. and Lyons, G. (2003a), “An adaptive controller for real-time resolution of the vehicle routing problem”, in Principles and Practice of Constraint Programming – CP2003, p. 998. Tighe, A., Smith, F.S. and Lyons, G. (2003b), “Optimisation enhancement using self-organising fuzzy control”, paper presented at IEEE Systems Man and Cybernetics Conference on Cybernetic Intelligence, Challenges & Advances, pp. 27-33. Wolpert, D.H. and Macready, W.G. (1997), “No free lunch theorems for optimization”, IEEE Transactions on Evolutionary Computation, Vol. 1 No. 1, pp. 67-82.

Wu, T., Low, C. and Bai, J. (2002), “Heuristic solutions to multi-depot location-routing problems”, Computers and Operations Research, Vol. 29 No. 10, pp. 1393-415. Zadeh, L.A. (1965), “Fuzzy sets”, Information and Control, Vol. 8, pp. 338-53. Zhu, K. (2000), “A new genetic algorithm for vehicle routing problem with time windows”, paper presented at International Conference on Artificial Intelligence. Zhu, K. and Ong, K. (2000), “A reactive method for real time dynamic vehicle routing problem”, paper presented at 12th International Conference on Tools with Artificial Intelligence. Zhu, K., Tan, K. and Lee, L. (2000), “Heuristics for vehicle routing problem with time windows”, paper presented at The 6th Int’l Sym. on AI and Math.

Optimisation enhancement

1455

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

K 34,9/10

Intelligent agents and distributed models for cooperative task support

1456

R. Patel, R.J. Mitchell and K. Warwick University of Reading, Reading, UK Abstract Purpose – To describe some research done, as part of an EPSRC funded project, to assist engineers working together on collaborative tasks. Design/methodology/approach – Distributed finite state modelling and agent techniques are used successfully in a new hybrid self-organising decision making system applied to collaborative work support. For the particular application, analysis of the tasks involved has been performed and these tasks are modelled. The system then employs a novel generic agent model, where task and domain knowledge are isolated from the support system, which provides relevant information to the engineers. Findings – The method is applied in the despatch of transmission commands within the control room of The National Grid Company Plc (NGC) – tasks are completed significantly faster when the system is utilised. Research limitations/implications – The paper describes a generic approach and it would be interesting to investigate how well it works in other applications. Practical implications – Although only one application has been studied, the methodology could equally be applied to a general class of cooperative work environments. Originality/value – One key part of the work is the novel generic agent model that enables the task and domain knowledge, which are application specific, to be isolated from the support system, and hence allows the method to be applied in other domains. Keywords Cybernetics, Intelligent agents, Control systems, Operational research Paper type Research paper

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1456-1468 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614768

1. Introduction This paper describes the design and implementation of an agent based network for the support of collaborative switching tasks. This work includes aspects from several research disciplines, including operational analysis, human computer interaction, finite state modelling techniques, intelligent agents and computer supported co-operative work. The work has been applied successfully in the control room environment of the National Grid Company plc (NGC), but the approach is general, and could be utilised in other co-operative work scenarios. This work can be considered a part of “operations research” as it involves the application of scientific approaches for resolving problems of design and decision, in particular the study of decision support systems. Within this field, the authors have particular interests in the use of artificial intelligence, most notably the application of expert systems. Such systems attempt to capture the knowledge of a human expert, a process known as knowledge engineering or knowledge elicitation, and store this The authors would like to thank the Engineering and Physical Sciences Research Council and The National Grid Company plc. for their support of the research project described here.

knowledge in a computable form together with inference and interface modules to provide additional information to both expert and non-expert human users (Basden and Hibberd, 1996; West et al., 2001; and Chan et al., 2002), amongst others. Early work in operations research focused on the analysis of working practices and the concept of alternative practices aimed at maximising operating efficiency and minimising running costs. However, this approach often needs organisational and operating modifications, which increase the costs associated with change in an established environment. Later work, particularly in the field of decision support systems, addressed these problems, although an emphasis on the support of individual users or processes has limited the scope of target environments. In the case of computer supported co-operative work the problems associated with multi-human machine interaction have been addressed but largely limited to communication issues (Patel, 2002). The research presented here addresses the problems of supporting collaborative work by applying conventional operations research techniques from within the operating environment via the actions of an agent network. Existing task processes are embedded in this network, so alleviating the costs typically entailed due to changes, and the benefits immediately realised. This work has demonstrated that a key aspect of the solution of both individual and group support problems within a collaborative environment is the method of implementing process models and support actions. The approach described here employs distributed finite state automata for both process modelling and the implementation of support actions. The geographically distributed components of the automata supply individual user support whilst the combined actions of the complete automata supply the support actions on a global level. The collaborative task support is achieved by modelling the tasks as distributed components within a communicating network of non-specialised agents. The agents can be non-specialised as domain knowledge has been separated from the actual support network. The principle advantages of this approach are the relative simplicity of the agent model, and the lower costs associated with adapting the respective support networks to new or evolving task environments. Distributed models are used as components of a task process so that a local form of task support can be provided. The task models are supplemented with planned support actions, so that the actions of connected users can be both directed towards task completion and assisted by automating the control of information presented at key stages of the task process. The distribution of collaborative tasks within a communicating network allows the automation of the co-ordination of involved users by blocking or enabling user actions during relevant phases of task completion. In addition to this direct task support, an additional communication medium attached to the active task models provides secondary support in the form of automated communication management. This approach also allows the support of multiple task processes by representing each task and action as a set of connected automata models. The use of automata also makes it possible for a network of relatively simple and non-specific agents to be used for the framework of the support system. The resulting framework can be applied to a general class of co-operative work environments with only trivial changes required to accommodate existing user interfaces and associated interaction methods. This approach also enables existing work processes to be maintained, thus avoiding re-engineering costs typically incurred in operations research work.

Intelligent agents and distributed models 1457

K 34,9/10

1458

This paper is structured with sections introducing the local task and agent models developed during this research; a summary of the operation of the resultant collaborative task support system within the transmission dispatch environment; and finally a discussion on the approach taken here and its use of cybernetic intelligence. 2. The local task model Mathematical modelling techniques are used extensively in operations research, where a model is defined as a representation of an object or process which illustrates the relationships between actions and reactions in terms of a cause and effect. One fundamental reason for the use of models here is the ability to apply quantitative techniques, such as statistical analysis or simulation, to investigate the relationships that exist between the variables within a task process. This is necessary for the identification of key aspects of the processes and for the design of solutions to problems within the work environment (Ackoff, 1979). In the work presented here, the process models developed have been used not only for the analysis of the application environment and subsequent design of support actions, they have also been used as the foundations of the support system. Tasks models have also been used in the field of human computer interaction. Early models were simplistic representations of user actions during the task process developed to design the appearance of interfaces and layout of buttons and menus (Johnson and Johnson, 1991). As computer interfaces became more complex, so the models used were extended to include notions such as user knowledge, beliefs and intentions, and have been accompanied by developments in the mathematical techniques used to represent tasks, relationships and knowledge structures. Task modelling in the design process is accepted as being useful. However, the complexity of modelling techniques and their relative effects on overall task operations is largely overlooked. The task models developed during this research rely on more qualitative descriptions of the task, requiring less exhaustive analysis of the processes, whilst ensuring adequate representation for task support. They are also embedded directly into the support network and not just used during the design phase. In fact, the task models are used as distributed components for task support, schedule management and communication management. They are termed local task models, encapsulating task knowledge, support strategy and communication control as a single entity. In detail, for the National Grid application, “Task Analysis” was performed (Patel, 1997) and a workable description of the task process produced that can be used to co-ordinate the application of specific support actions at each user location. Subsequent analyses of the task were carried out to decompose the basic sub-tasks into sufficiently detailed components required for meaningful task support to be offered to individual users. At each point where task support is to be delivered, a set of requirements is included to ensure the delivery of support is coordinated with the processing of the task. These requirements take the form of instances of the environment model embedded at strategic points of the task process. In order to fully populate the complete task model, the description of the process is complemented with models of user involvement and communications. The model of user involvement is a description of the set of users involved during task completion and their activity levels throughout the task process. The communications model is a description of the required

communications between involved users during task execution and is used in conjunction with the user involvement model to provide both communication and synchronisation support. This was useful for the scenario described in Section 4, but National Grid also found the task analysis document helpful for describing operations in their Control Room and for training. The co-operative task processes are represented as series of automata encapsulating task models and support actions, so that a network of connected controllers can interface external stimuli to the automata and deliver support actions to the task environment. This state-based approach to system modelling allows the actions of the user and other active objects in the actual environment to be used to advance the states of representative models. Individual support actions can be provided by enhancing the models of the tasks with designed support actions, these actions can be effected upon the actual environments when key states of the task models are activated and de-activated. A key aspect of the work is to allow task support in a co-operative environment where users are engaged in the execution of multiple tasks. This requires some form of control over the temporal requirements of the division and execution of tasks between affected users, and a token-based solution is employed to manage the scheduling of distributed task processes. By modelling complete tasks as distributed components at each user location, the proper coupling of concurrent sub-tasks can be ensured by adopting the petri-net method of token passing at transition states (Patel et al., 2001a). This has been successfully demonstrated by its application within the transmission dispatch environment (Patel et al., 2001b). This approach ensures the correct scheduling of tasks is achieved regardless of user related delays, and multiple tasks have been scheduled without increasing the complexity of the distributed controllers. Figure 1 shows the connections between concurrent task processes in a three task environment. The actions of each user are dictated by the scheduling of process models at each user location. The magnified area of this figure shows the arrangement and token connection of task automata. Here the tokens passed between individual automata models manage the scheduling of individual task processes, and the connections to these task models synchronise automata processing and trigger the delivery of task support actions. Within the individual task model segments, the automata are shown as sequential collections of task states ðS an ; S bn and S cn Þ: Although a user may have multiple sub-task models to perform, the agent can only act on models where a token is present. Included within the sub-task models are commands for passing tokens to concurrent sub-tasks. When passed, these tokens activate the model or models representing the next phase of the task process. By enabling the connected users to select only activated models, the agents ensure the correct co-ordination of the distributed processes. Key to the local task models are the automata representing strategic points of the represented task processes, associated with which are models of the task environment used to determine when the actual task process has reached these strategic points. Individual support actions that form the overall task support plan are embedded at these strategic points such that the connected agents effect changes to the user environment so the next strategic point in the task process to be attained. Support actions at points of interest and individual communication commands are embedded within the local task model. So that the processing of these local task models

Intelligent agents and distributed models 1459

K 34,9/10

1460

Figure 1. Task model structure for a three task environment

is consistent with the co-ordination of task processes required for correct operation of the task environment, the initial conditions and final edge events are attached such that the connection of individual models representative of a single task reflect the ordering of individual local task models within the complete task model. This use of ordinal scales results in the enforced ordering and synchronisation of individual user actions within a multi-user environment. Here the decision making strategy is defined at the analysis and modelling phase of the support process and embedded within the distributed task models.

This local task automata approach means that the tasks are modelled in a straightforward manner, embedded support actions are delivered consistently with the actual task environment and, as all task specific information is included in the automata, non-specialised computing agents can be used to form the distributed support network.

Intelligent agents and distributed models

3. The agent model An agent is an autonomous entity with reactive and proactive behaviour, often able to learn, co-operate and be mobile. In effect these properties were set by early artificial intelligence researchers attempting to mimic human abilities (Hewitt, 1997). Agents have existed for some time, but now the study of agents is one of the fastest growing areas of information technology research. The research described here concentrates on the use of agents in the field of human computer interaction. Interface agents, or intelligent interface agents (Maes, 1994), have been applied to many interaction issues, all of which can be classified either as pedagogical or facilitating applications. A pedagogical agent provides knowledge in the task domain, either as direct task knowledge or enabling knowledge in the form of task instructions or guidelines. Other forms of interface agent are concerned with providing a service to the human user, and exist on two levels. At the lowest level, task objectives are achieved by direct interaction between user and agent; at a higher level, the interaction is indirect, being achieved by manipulating the task environment. This technique of using interface agents to result in a complementary interaction technique has been termed indirect management (Kay, 1990). The implications of the individual connections existing within the human computer interaction environment developed during this research can be seen in Figure 2. In addition to the expected interactions between the user, interface and task, the inclusion of an agent in the user environment enables further interactions to take place as a result of user requests, changes to the task environment, or observed user actions. Here, direct interaction results from interfaces between the agent and user, and between the agent and task. Task interaction comes from the interfaces existing between the user and the task, and indirect management results from the interactions between the agent and the interface. Normally, agent-interface interactions focus on the control of adaptive interfaces and the generation of active help systems. This research extends the scope of indirect management by enabling individual agents to manipulate user perspectives to achieve task objectives.

1461

Figure 2. Agent/user interface environment

K 34,9/10

1462

Figure 3. Distributed support network structure for a three task environment

Interface agents have been applied to collaborative environments, utilising both agent and computer supported co-operative work techniques. As an example, in process control (Cockburn and Jennings, 1996), complex agents were used to represent the various users or departments involved in task completion, or specific aspects of the task. The research presented here, in contrast, uses a community of relatively simple generic agents capable of processing task models that reflect the complexity of the application environment. The resultant system remains flexible to changes in the task environment and, as the agent and task domains are separate, the system is generic and applicable to all collaborative work environments. The system is a distributed support network with a community of co-operating intelligent agents which deliver support and schedule management to collaborative task environments by processing the local task automata models. One agent is at each user location, and its actions are directed by embedded task automata representing user goals, processes and support strategies. The implementation of support actions by individual agents results in the management and support of connected user activities by controlling the content and responsiveness of connected interfaces, and the management of task processes by influencing the actions of connected agents. Given this generic agent model, each task present in the user environment is represented by a task environment model at each user location and a series of local task automata distributed between the involved users. The automata initial conditions, transition conditions and embedded support actions, when acted on by the connected agents, supply the link between models, support network and human users. The organisation of agents and embedded models in a three user, three task environment is shown below in Figure 3. Each user works via a single interface controlled by an agent, and all agents interface with each other. It is important when constructing this support network, to employ fundamental agent design methods to realise the actual support entities. This creates a methodology that enables the inexpensive development of agent-based software, which should

exhibit the three key properties of flexibility, scalability and maintainability. A widely accepted agent design process, the Gaia methodology proposed by Wooldridge and Jennings, is indicative of the issues covered by conventional agent design methods; an updated review of this method can be found in Wooldridge et al. (2000). The Gaia design process comprises a two-step approach; finding the roles in the system, and modelling the interactions between these roles. In this case, a role is an entity displaying the four attributes of responsibilities, permissions, activities and protocols. Although this analysis approach was not applied to the actual agent community during the analysis and modelling phases of this research, the nature of the activities and protocols found have influenced the design of the agent model. This is because the separation of analysis and knowledge capture and interaction modelling from the design of the agent removes the need for repeating the entire development process when tailoring the agent support network. The requirements analysis undertaken during the agent design process was based on the earlier analysis of collaborative environments and resulted in a set of generic agent methods for the manipulation of internal models of the tasks and domains, and external representations to the user and other agents. The Gaia approach also proposes the use of identified roles to determine the make-up of the agent community. This mapping of roles into agent types has also not been used in the design process, because the automata models include specifications of the role of the user, allowing a single agent type to be used to represent all possible roles existing in the task environment. Another benefit of embedding of domain knowledge in the task automata model is that it enables a natural approach to agent intelligence to be realised, based on the commonly used beliefs, desires and intentions (BDI) model (Bratman, 1987), so that the agent model described here is governed by models of the task environment, task processes and support plans. Here “Beliefs” is used to represent knowledge of the world as held by the agent, this knowledge is stored as a collection of environment models each linked to specific tasks under support. These models have been implemented as both dynamic representations of task-particular equipment and snapshots of this equipment for use as automata transition criteria. Here the exhaustiveness of the beliefs model set is limited to the physically realisable configurations of equipment involved in the individual user processes. The “Desires” of an agent, often referred to as goals, represent some desired end state and in this case are described within the task automata as transition criteria. The “Intentions” of an agent refer to the committed plans or procedures for satisfying the desires of the agent, and in the agent model described here are represented by the planned support actions embedded within the local task automata model. The process models and their associated support strategies are implemented using the collaborative interface agent model. This model undertakes the three activities required for successful application of the distributed support network, processing of local task automata, observing the task environment, and delivering support actions. So that these activities are synchronised with events in the actual task environment, the agent model designed here consists of several control and decoding modules responsible for interfacing, controlling and processing the task models. The network of agent modules, interconnections between these modules and connections to the task models, environment models, actual task environment and the agent network are shown in Figure 4. .

Intelligent agents and distributed models 1463

K 34,9/10

1464

Figure 4. Generic interface agent model

Here the processing of events in the task environment, user actions or agent network events are shown as a series of actions, originating from the task environment, graphical user interface or agent network and undertaken by the agent modules. The delivery of support actions by the proposed agent model is also represented as connections originating at the interface and communication controllers to the graphical user interface and agent network, respectively. 4. Example operation of the support system The proposed task support methodology was tested using a transmission despatch scenario representative of an above average workload for control room engineers, which could occur when there was a rearrangement of the grid and associated live-line working. This scenario comprised ten circuit outage tasks being carried out by a team of three transmission dispatch engineers (TDEs), with a transmission management engineer (TME) working in a supervisory manner. Five circuit outages require the attention of a primary and a secondary switching engineer. The remaining five outage tasks are undertaken by single transmission despatch engineers under the supervision of the TME. The distribution of these ten switching tasks between the three despatch engineers can be seen in Figure 5: here the tasks requiring cooperation between multiple switching engineers are those covering multiple switching engineer zones. The effectiveness of the local task support actions delivered by the support network was investigated by comparing the completion times of the example tasks for both conventional and supported experiment pairs. A group of test subjects performed each test, and the experiment pairs were completed twice by each group. The first

Figure 5. Division of tasks in example switching scenario

experiment pair followed a pre-planned task schedule, and a strict ordering of tasks was enforced for each involved user. During the second pair of tests, the task scheduling restrictions were lifted (apart from those which are needed as regards safety) to enable users to dictate the scheduling of individual task processes on an ad-hoc basis. These tests were carried out with ten user groups and the results averaged to measure the effects on individual and group user performance. Due to practical difficulties in obtaining access to the required number of expert users these tests were carried out with untrained users that were subjected to a familiarisation process for the transmission despatch scenario and interfaces. The ordering of supported and un-supported test-runs was also varied to ensure any measured differences due to user-learning were removed from the overall performance measures. Taking the initial switching requests and completion acknowledgements for individual tasks, average task completion times have been measured. These are shown in Figure 6, where the “ad-hoc” results are shown in the top chart, and the “pre-planned” results in the bottom chart. The graphs show the time taken for the completion of each of the ten tasks: for each task, the left of the two “bars” shows the time when the support network is used, the right bar is without. The above is useful, but the performance of the entire user group throughout the scenarios must be examined in order to realise the full impact of the support actions. The average performances of the user group as a single entity during each of the four evaluation cycles is presented in the two graphs in Figure 7, the top graph shows pre-planned, the bottom the ad-hoc tests: in each graph the dotted line depicts the time taken without the support network, the solid line where the support network is used.

Intelligent agents and distributed models 1465

Figure 6. Individual task performance in ad-hoc and pre-planned scenarios

K 34,9/10

1466

Figure 7. Group performance in ad-hoc and pre-planned scenarios

As can be clearly seen, in both cases the support network has reduced the time taken for the switching tasks to be achieved by a significant amount. 5. Discussion The main objective of this research was to investigate local modelling and agent techniques to undertake the cybernetic tasks of control and communication within a co-operative work environment. To address this objective a model encompassing task processes, user involvement knowledge, and communications protocols and task support actions is proposed for local task automation. A co-operative community of interface agents is used, as a distributed support network, to process these automata and deliver the embedded support actions. The local task automata have single entities modelling task and support processes, distributed at each user location. This is a straightforward logical method. The individual task automata support each user, and enable the co-operative task to be scheduled automatically. The embedding of task, user and support knowledge in the automata, allows for a simple support network. The distributed support network approach ensures local task support actions are delivered by individual interface agents, with the networking of these interface agents being exploited for both global task support actions and communication support. The network comprises simple non-specialised computing agents, as all user and task specific knowledge are embedded in the local task automata. By dedicating the primary goal of the support agents to the processing of any active task automata, the actions of individual agents are tailored to reflect the requirements for the specific users and tasks represented by each automation.

When applied to the transmission despatch environment, the distributed support network significantly reduced the completion times of both individual task processes and the entire task scenario, and the communication support it provides ensures correct delivery and decoding of all task-related user communications and the automation of standard user communications identified during the task analysis process. The overall effect of the communication support actions is to minimise the possibility of communication errors and to further reduce task completion times. 6. Conclusions Various methods have been successfully applied to the collaborative task support problem. Of particular relevance to the cybernetic intelligence domain, this work has resulted in the development of an enhanced automata structure for use as distributed task models and a flexible interface agent community for processing these automata. The resultant support system has demonstrated the successful application of analysis, modelling and intelligent interfacing techniques to deliver a realizable task support solution for complex cooperative environments. References Ackoff, R.L. (1979), “The future of operational research is the past”, Journal of the Operations Research Society, Vol. 30 No. 7, pp. 93-104. Basden, A. and Hibberd, P.R. (1996), “User interface issues raised by knowledge refinement”, International Journal of Human-Computer Studies, Vol. 45 No. 2, pp. 135-55. Bratman, M.E. (1987), Intentions, Plans, and Practical Reason, Harvard University Press, Cambridge, MA. Chan, C.W., Peng, Y. and Chen, L.L. (2002), “Knowledge acquisition and ontology modelling for construction of a control and monitoring expert system”, International Journal of Systems Science, Vol. 33 No. 6, pp. 485-503. Cockburn, D. and Jennings, N.R. (1996), “ARCHON: A distributed artificial intelligence system for industrial applications”, Foundations of Distributed Intelligence, Wiley, New York, NY, pp. 319-44. Hewitt, C. (1997), “Viewing control structures as patterns of passing messages”, Artificial Intelligence, Vol. 8 No. 3, pp. 323-64. Johnson, H. and Johnson, P. (1991), “Task knowledge structures: Psychological basis and integration into system design”, Acta Psychologica, Vol. 78 No. 1, pp. 3-26. Kay, A. (1990), “User interface: A personal view”, The Art of Computer Interface Design, Addison Wesley, Reading, MA. Maes, P. (1994), “Agents that reduce work and information overload”, Communications of the ACM, Vol. 7 No. 7, pp. 31-40. Patel, R. (1997), “Control and monitoring information display using intelligent systems: task analysis results”, Technical Project Report, Department of Cybernetics, University of Reading. Patel, R. (2002), “An investigation of distributed modelling and intelligent agent techniques for collaborative task support”, PhD thesis, Department of Cybernetics, University of Reading. Patel, R., Mitchell, R.J. and Warwick, K. (2001a), “Applying distributed finite state automata to an agent network for cooperative task support”, Proceedings of Intelligent Agents, Web Technologies and Internet Commerce, Las Vegas, CA.

Intelligent agents and distributed models 1467

K 34,9/10

1468

Patel, R., Mitchell, R.J. and Warwick, K. (2001b), “An agent based approach to distributed task modelling for cooperative task support”, Proceedings of People in Control, Manchester, UK. West, G.M., Strachan, S.M., Moyes, A., McDonald, J.R., Gwyn, B. and Farrell, J. (2001), “Knowledge management and decision support for electrical power utilities”, Knowledge and Process Management, Vol. 8 No. 4, pp. 207-16. Wooldridge, M.J., Jennings, N.R. and Kinny, D. (2000), “The Gaia methodology for agent-oriented analysis and design”, Autonomous Agents and Multi-Agent Systems, Vol. 3 No. 3, pp. 285-312.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Landscape classification and problem specific reasoning for genetic algorithms F. Mac Giolla Bhrı´de, T.M. McGinnity and L.J. McDaid

Landscape classification

1469

School of Computing and Intelligent Systems, University of Ulster, Magee Campus, Derry, UK Abstract Purpose – This paper addresses issues dealing with genetic algorithm (GA) convergence and the implications of the No Free Lunch Theorem which states that no single algorithm outperforms all others for all possible problem landscapes. In view of this, the authors propose that it is necessary for a GA to have the ability to classify the problem landscape before effective parameter adaptation may occur. Design/methodology/approach – The new hybrid intelligent system for landscape classification is proposed. This system facilitates intelligent operator selection and parameter tuning during run time in order to achieve maximum convergence. This work introduces two adaptive crossover techniques, the runtime adaptation of crossover probability and the participation level of multiple crossover operators in order to refine the quality of the search and to regulate the trade-off between local and global search respectively. In addition, a Rule-Based reasoning system (RS) is presented which can be utilised to analyse the problem landscape and provide a supervisory element to a GA. This RS is capable of instigating change by utilising the analysis in order to counteract premature convergence, for various classes of problems. Findings – Results are presented which show that the application of this Rule-Based system and the adaptive crossover techniques proposed in this paper significantly improve performance for a suite of relatively complex test problems. Originality/value – This work demonstrates the effectiveness of landscape classification and consequent rule-based reasoning for GAs, particularly for problems with a difficult path to the optimal. Moreover, both adaptive crossover techniques proposed present improved performance over the traditional static parameter GA. Keywords Cybernetics, Programming and algorithm theory, Artificial intelligence Paper type Research paper

1. Introduction Genetic algorithms (GA) are a major force in evolutionary computing. They are a search technique inspired by Darwin’s theory of evolution (Darwin, 1859) and were formally introduced by John Holland (1975) in the 1970s. The classic GA, also known as the sequential genetic algorithm (SGA) is the simplest form of GA. Once initialised, it executes in a sequential manner, working with only one population set. In each generation the population evolves by probabilistically selecting individuals from the population (chromosome list) according to their fitness for reproduction (Goldberg, 1989). Using crossover and/or mutation these structures are then mated and modified in order to produce a new population of (potentially) superior fitness. To ensure survival of the fittest, preference is given to the better individuals, usually determined by a selection function or critical evaluation function.

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1469-1495 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614777

K 34,9/10

1470

A problem with this form of GA is that its parameters and settings such as crossover and mutation probability are generally set before execution. In many circumstances “ideal settings vary depending on population size, network topology and the nature of the domain” (Srtacuzzi and David, 1998). The successful convergence of a GA is very much dependant on the performance of the genetic operators and their respective parameters. In most cases, all parameters must be set before runtime; this includes, but is not limited to, the various genetic operators and their relative probabilities, population size and the fitness scaling function. Because of the number of interdependent settings it is extremely difficult to find the optimal combination. For each individual problem it is necessary to find the correct combination of these settings if one hopes to find the optimal solution. This process is known as parameter tuning. Through the years, extensive work has been carried out in an attempt to find optimal parameter settings. DeJong (1975) completed the first comprehensive work in this field by developing a suite of GA test functions and proposing suitable parameters which would work well for a variety of problems: (population size (n): 50, crossover probability ( pc): 0.6, mutation probability ( pm): 0.001). This work was furthered by Grefenstette (1986) who used a “meta-ga” in order to learn suitable online parameter values ( ps: 30, pc: 0.95, pm: 0.01) and offline parameter values (n: 80, pc: 0.45, pm: 0.01) and by Schaffer et al. (1989) who used exhaustive testing to find optimal parameters (n: 20-30, pc: 0.75-0.95, pm: 0.005-0.01). A GA’s online performance is the average of all evaluations of the entire run whereas the offline performance is the best value in the current population, averaged over the entire run of generations. A review of the optimal mutation rate parameters proposed by the above can be found in Smith and Fogarty (1996). It has since been suggested by DeJong (1985) that there are in fact no optimal parameter settings which hold for the entire run of generations and that the GA should have the ability to fine tune the parameter settings while the population evolves. Research by Eiben et al. (2000), Hinterding (1997), Smith and Fogarty (1996), Deb and Beyer (1999), Eiben et al. (2000), Herrera and Lozano (1996) and Spears (1995, 1997) have confirmed this. Eiben et al. (2000) lists the technical drawbacks of tuning parameters manually, based on experimentation. Consequently, current research is focused on developing GAs, which are capable of fine-tuning their parameters as the population evolves. This form of GA is known as the adaptive genetic algorithm (AGA). Dynamic (or adaptive) GAs are a growing area of research in which there are many and varied attempts at developing an algorithm which has the ability to vary, adapt and evolve its parameters and settings in order to accommodate the various requirements of the GA. These methods include but are not limited to, self-adaptive GAs (Deb and Beyer, 1999; Schaffer and Morishima, 1987), adaptation using fuzzy logic controllers (Herrera and Lozano, 1996) and deterministic rules (Davis, 1989, 1991). Although these adaptive techniques generally result in a marked improvement over the static parameter GA, the fact remains that an AGA which is specifically tailored to a certain problem will more often than not outperform the best non-problem specific AGA. This theory was formulised by Wolpert and Macready (1995, 1997) who explored the relationship between effective search and optimisation techniques and various problems to which they are applied and consequently presented the No Free Lunch Theorem (NFLT). The NFLT states that there is in fact no one algorithm, which

outperforms all others for all possible problem landscapes. The theorem suggests that any change to an algorithm which results in improved performance for one class of problem is “exactly paid for in performance over another class”. In view of this argument, the authors propose that it is necessary for a GA to have the ability to classify the problem landscape before effective parameter adaptation may occur. This paper addresses issues dealing with GA convergence and the implications of the NFLT on two levels, first, by proposing two adaptive crossover techniques which help improve convergence. The first technique (adaptive crossover probability – ACP) is invoked in order to refine the quality of the search as the GA approaches the optimal and the other is designed to regulate the trade-off between local and global search using multiple crossover operators (adaptive participation level – APL). Second, a rule-based landscape classification and invocation mechanism is presented which dynamically adjusts parameters and settings in response to ongoing landscape classification. This adaptive control not only determines the application of ACP and APL but also adjusts GA settings and structures, including replacement strategy and mutation probability. It will be shown that the hybrid AGA is capable of significant improvement over the SGA. The paper begins in Section 2 by giving an overview of AGAs, focusing on the formal classification of the various methodologies and highlighting some of the research in these respective areas. Section 3 presents a hybrid rule-based system, which incorporates the adaptive techniques proposed in Section 4. These new adaptive techniques focus on the runtime adaptation of the crossover operator in order to assist GA convergence. Section 5 gives an overview of seven benchmark problems against which the hybrid AGA system is tested, followed by results and analysis in Section 6. 2. Adaptive genetic algorithms There are several forms of adaptive genetic algorithm (AGA) which can be used to dynamically fine tune algorithm parameters to the problem, while it is being solved (Angeline, 1995; Tuson and Ross, 1998). This adaptation is essential in large complex problems where it would be impractical to restart the GA each time one wishes to make an alteration, and also for problems where the path to the optimal is obstructed or difficult. Hinterding et al. (1997) provide a classification for the various forms of adaptive evolutionary algorithms (EA), i.e. dynamic deterministic, dynamic adaptive and dynamic self-adaptive. 2.1 Deterministic adaptation Deterministic adaptation is the simplest form of dynamic adaptation, where the adaptation is governed by predefined deterministic rules such as time variation or lapsed generations (Davis, 1991). Fogarty (1989) deterministically adapts mutation probability ( pm) for an industrial optimisation problem by exponentially decreasing pm over generations and over the bit representation of each integer in an individual. By doing so he found a significant improvement when the GA was positively seeded. However, the improvement was not as substantial when the population was randomly seeded. 2.2 Adaptive adaptation Adaptive adaptation is arguably the most diverse of the aforementioned techniques. Widespread research has been carried out to develop new, robust methods

Landscape classification

1471

K 34,9/10

1472

and algorithms for controlling and updating parameters based on feedback from the GA. Methods of adaptive adaptation vary considerably. Eiben et al. (1998) use a parallel GA with competing subpopulations in order to detect the best crossover operators. Herrera and Lozano (1996) employ fuzzy logic controllers to adapt parameter settings and genetic operators. Most common are updates to pc and pm with reference to population diversity and/or the performance history of operators and individuals. Davis (1989) updates operator probabilities based upon the past performance of each operator. When an operator is used to produce offspring of improved fitness it is awarded credit equal to the improvement in fitness and diminishing proportions of fitness are awarded to the offspring’s parents or grandparents. Srinivas and Patnaik (1994) present an adaptive, generational GA for multimodal function optimisation. The crossover probability ( pc) and the mutation probability ( pm) are varied based on the fitness values of individual chromosomes with respect to the population, with the aims of maintaining population diversity and avoiding premature convergence. If the GA begins to converge on a local optimal pc and pm are increased, alternatively, if the population is scattered around the search space, pc and pm are decreased. pc and pm are chromosome specific with high pc and pm values for low fitness solutions and low pc and pm values for high fitness solutions. Results presented show that the AGA performs at least as well and in some cases significantly better than the SGA on a wide range of multi-modal test problems. Research by Wu et al. (1998) and Zhou et al. (2002) claim to have further developed Srinivas and Patnaik’s work to produce better quality results for optimal reactive power dispatch and voltage control of power systems and third-order induction motor parameter estimation. This was achieved by giving the algorithm more information on population diversity relative to specific chromosomes in the form of the normalized fitness distance between the current solution and other solutions in the population. Martin et al. (2003) employ roulette wheel selection (RWS) as a method of operator selection where, operator probabilities are varied based on population diversity. The method relies on a measure of operator redundancy r i ¼ 1 2 ðxi =yi Þ in order to calculate individual operator probabilities (i.e. segments of the roulette wheel), where yi is the total number of chromosomes to which operator i has been applied and xi is the number of first instance new chromosomes. Three criteria are used to invoke adaptation, the slope of the linear regression of the maximal fitness, the correlation coefficient of the linear regression and a linear diversity coefficient c ¼ 1 2 ðfitavg =fitmin Þ where fitavg is the average population fitness and fitmin is the lowest fitness value in the population. These measures help define the state of the population, i.e. constant, unstable and a measure of diversity. Julstrom’s (1995) adaptive probabilities are computed based on the most recent contribution of each operator to the GA’s progress. Probabilities are proportional to each operator’s relative contributions and are updated after the generation of each new chromosome. 2.3 Self-adaptive adaptation Some of the earliest work carried out on the adaptation of GA parameters falls into this category, where the GA parameters to be adapted are encoded into existing

chromosomes and allowed to evolve along with the population. Schaffer and Morishima (1987) presented one of the first self-adaptive crossover mechanisms which distributes an individual’s crossing points for recombination by appending a bit-string of crossover punctuations to existing chromosomes. This method was then furthered by Levenick (1995) who used additional bits in order to evolve crossover probability. Alternatively, Spears (1995a, b) proposed that a bit-tag be appended to each chromosome to indicate which crossover operator should be used, allowing the simultaneous application of two crossover operators to the same population. One of the most common applications of self-adaptation is that of encoding mutation probability and step-size into each chromosome (Ba¨ck, 1992a, b, 1996; Yao et al., 1999). Mutation step size applies specifically to evolution strategies (ES) (Rechenberg, 1973). 2.4 Current issues As illustrated above there have been many and varied attempts at finding a correlation between the various GA parameters and the state of the algorithm as it evolves. Although these adaptive techniques generally result in a marked improvement in performance over the SGA, the NFLT states that any algorithm, which tailors itself to a certain class of problem will more often than not outperform the best non-problem specific algorithm. According to the NFLT (Wolpert and Macready, 1995, 1997), there is in fact no one algorithm, which outperforms all others for all possible problem landscapes. It is proposed by the authors that a GA should be accompanied by a supervisory rule-based landscape classification and initialisation mechanism which has the ability to modify and adapt parameters according to a form of landscape classification and hence avoiding premature convergence thereby increasing the probability of finding the optimal. Also, two new adaptive crossover techniques have been developed in order to increase convergence velocity (adaptive crossover probability, Section 4.2.1) and regulate between local and global search by employing multiple crossover operators (active participation level, Section 4.2.2). 3. Implementation strategy Eiben et al. (2000) argue that the additional computational cost incurred due to online parameter tuning may not be justified by the improved performance (if any). In response to this argument, the architecture presented in this paper will consist of two autonomous threads running concurrently, the GA thread and the reasoning thread. The reasoning thread will be dedicated to making critical decisions and reasoning during execution, allowing the GA to evolve in normal runtime while the reasoning system (RS) analyses GA performance and updates the relative parameters accordingly. It is the reasoning module which coordinates and delegates all runtime adaptation. The reasoning module interacts with the GA by subscribing to various “Genetic Events” which are fired each time a significant event occurs, such as “offspring created” or “new generation”. These events are used to transmit data back to the reasoning module, which then uses this data to invoke the adaptive techniques discussed in Section 4.2. The two layers can be broken down as follows:

Landscape classification

1473

K 34,9/10

1474

3.1 The GA thread It is within this thread that the GA will run. Its parameters will be continuously monitored and updated by the reasoning module (e.g. alterations to crossover probability) and will execute them within its own thread, leaving the reasoning thread free to monitor and deliberate over the next change. The GA presented in this paper employs RWS (Rogers and Pru¨gel-Bennett, 2000), where every member of the population is given a slot in an imaginary roulette wheel. The size of each slot is proportionate to that member’s fitness in relation to the population. Individuals are then selected by spinning the roulette wheel. Selection probability is proportional to the individual’s fitness although there is a danger of the highly-fit members dominating the selection and forcing the GA to converge prematurely. The population size is a very important factor in any GA as it affects both performance and efficiency. The GA implementation proposed here uses a relatively small population size relative to possible schemata. However, GAs with a small population size generally do not perform well and tend to converge prematurely on a local minimal. This is as a result of an insufficient sample size and the possibility of the population being dominated by one super-fit chromosome. As the population converges, the range of fitness values becomes smaller. As a result, convergence slows down and this can also result in premature convergence. On the other hand, the smaller the population size, the faster the evolution of each generation. Therefore, in order to maintain both speed of convergence and accuracy a GA with a small population will be employed and the rule-based RS will monitor the population and effect changes to negate the effects of premature convergence. 3.2 The reasoning thread This thread implements the system’s reasoning capabilities and the adaptive techniques presented in Section 4 and continually deliberates over input received, performs reasoning operations and coordinates the runtime adaptation of the GA. The reasoning module is based on a hybrid of empirical probability algorithms and rule-based reasoning. Figure 1 gives a breakdown of how these are incorporated at each stage of the reasoning process. The RS begins by conducting a basic survey of the fitness function in order to set initial parameters, e.g. mutation probability: pm ¼ 0:25=L where L is the chromosome length. Subsequent to initialization, each generation begins with landscape classification and a review of the current state of the population (convergence ratio – CVR and population diversity, Section 4.1). If the CVR is acceptable the RS will update the participation level of each crossover operator and invoke ACP if current classification of the fitness function indicates a unimodal problem. If the population breaches the CVR threshold (Figure 2) or remains constant for a specified number of generations, problem specific adaptive rules presented in Section 4.1.4 are invoked in order to counteract premature convergence and thereby increase the probability of finding the optimal. The reasoning thread is essentially the culmination of three newly proposed adaptive techniques whose function is to increase convergence velocity, regulating between local and global search and counteracting premature convergence. This will be demonstrated using a suite of six benchmark test functions described in Section 6.

Landscape classification

1475

Figure 1. Reasoning plan of execution

Figure 2. CVR threshold effect

4. Hybrid rule-based adaptive GA It is the reasoning module which attempts to maintain effective parameter settings throughout the run of the algorithm. This is achieved by monitoring operator performance, population diversity, the population’s CVR (Section 4.1.1) and performing ongoing landscape classification. Based on these responses a rule-based RS will invoke parameter change in order to avoid premature convergence and hence increasing the probability of finding the optimal. The RS uses these responses in order to classify

K 34,9/10

1476

the fitness landscape and hence apply problem specific parameter change. In addition, the RS will have at its disposal two new crossover parameter optimisation techniques (Section 4.2), ACP and APL which can be used to speed up convergence for unimodal problems and regulate between local and global search, respectively. 4.1 Deterministic rules Premature convergence on a non-global optimal is a common problem facing GAs and can result in extended execution time or even failure to converge. Traditional methods of avoiding premature convergence include using large population sizes, which call for more evaluations each generation, and hence, extended runtime. Also, elevated mutation probabilities may be applied to maintain diversity but this generally has a negative effect on solution quality. Therefore, a rule-based system is proposed which analyses the GA’s fitness function and monitors the GA’s CVR and diversity in order to counteract the effect of premature convergence using a set of pre-determined rules. If the GA’s CVR (Section 4.1.1) shows that the GA has begun to diverge (Figure 2, Generation 40) or has failed to converge in x successive generations, the RS will suggest possible parameter options in order to provide a possible solution. The RS uses three measures in order to decide which rules to invoke (Sections 4.1.1-4.1.3). Pseudo-code for the rule-based adaptive GA is given in Section 4.1.4. 4.1.1 Convergence ratio (CVR). The GA CVR reflects the direction of the algorithm’s population and is an extension of the algorithm proposed by Espinoza et al. (2001). If the population is converging, diverging or static it will be reflected by the CVR. The GA CVR is given by the mean and standard deviation of the population fitness x¼i X

CVR ¼

CVi210 CVi220

¼

x¼i210 10 x¼i X

CVx ð1Þ CVx

x¼i220 20

where CVði210Þ is the mean of the coefficients of the variations between generations i 2 10 to i and CVði220Þ is the mean of the coefficients of the variations between generations i 2 20 to i. CVR represents the change in coefficients of the variation from i 2 20 to i 2 10: If CVR . 1 for minimization problems or if CVR , 1 for maximization problems when elitism is not employed then the solution is diverging (as shown in Figure 1 – generation 40). This is known as the CVR threshold effect. The CVR threshold is allowed a ^ 0.02 buffer zone in order to allow negligible divergence to be ignored. If the CVR is acceptable, the RS will continue to refine operator parameters based on the criteria given in Sections 4.2.1 and 4.2.2. However, if the CVR breaches the allowed threshold (generation 40 in Figure 2) or remains at a constant for an extended period of time the RS will revise the state of the GA and invoke alternative parameter setting methods based on the library of rules in Section 4.1.4 (pseudo-code) in an attempt to re-instigate acceptable convergence. 4.1.2 Diversity. A measure of population diversity is given by f avg 2 f min where favg is the population’s average fitness and fmin is the population’s minimum fitness, and

allows the RS to make informed decisions in the adaptation of various GA parameters. The population diversity can also be used to classify the fitness landscape, e.g. given the fitness function F5 (Section 5) diversity will always be high while mutation probability is not zero. 4.1.3 Classification of fitness landscape. In order to implement productive adaptation using a rule-based system the RS must first identify certain fitness landscape characteristics. For example, distinction between unimodal and multimodal, if the problem converges to a plateau or if there is noise present. Various rules such as those presented in Figure 3 are employed to help estimate a problem classification. 4.1.4 Rules for adaptation. The following rules (Figure 4) were generated in order to facilitate the test problems F1-F6 specified in Section 5. They provide a mechanism for the integration of the various adaptive techniques with the GA in order to achieve “dynamic” adaptive capability, where operators and structures are dynamically updated during runtime according to problem specification to achieve optimal performance. It should be noted that the adaptive technique proposed for large multimodal problems, e.g. F6 (Section 5), will cause a considerable increase in runtime. However, this repercussion can be greatly reduced by adapting the rules to initialise another parallel GA alongside the original. This can be easily achieved a programming language which supports threads is used although it will not be the focus of this paper. In such a case the RS would continually monitor and adapt both parallel GAs and employ migration to transfer good schemata from each thread to the other. The adaptive rules use three measures of environmental feedback (from both GA and fitness function) to determine what rules and techniques should be invoked.

Landscape classification

1477

4.2 Adaptive crossover techniques Yang (2002) identifies three forms of crossover adaptation: . adapting the type of crossover being used; . adapting the rate of crossover; and . adapting the crossing point or swapping probability in each locus. The adaptive crossover techniques proposed in this paper focuses on the first two of these techniques which were first introduced in MacGiollaBhrı´de et al. (2003) where two adaptive strategies were applied to four inverted test problems from the DeJong (1975) test suite. First, by subjecting various portions of the GA to alternative crossover operators each generation and updating their respective participation level and second, by constantly refining the crossover probability. There are many various forms of crossover operators, each one applies specifically to certain problem types,

Figure 3. High-level pseudo code for AGA fitness landscape classification

K 34,9/10

1478

Figure 4. Pseudo code for AGA rules

and some of these are known to produce fitter results than others (Syswerda, 1989; Goldberg and Lingle, 1985). In many cases, other parameter settings influence the effectiveness of each crossover operator, operator settings such as the population size, which makes it very difficult to predict the effect of each operator on the population (DeJong and Spears, 1991). Therefore, in a system with varying parameters it is essential that it has the ability to react to environmental changes. The selection function, representation scheme and fitness function can also have an impact on the crossover and mutation utility. The crossover-based adaptation presented in this paper is based on empirical

probability where the set of available crossover operators V are applied to the same population each generation. Given V, the set of available crossover operators, each individual operator will be used to produce new offspring according to the relationship: Onew ¼ npc lVx

ð2Þ

where Onew is the number of new offspring created using crossover in the next generation. Given n, the current population size, the crossover probability ( pc) determines the portion of the individuals in the population which will be subjected to crossover. The participation level ðlVx Þ of each crossover operator Vx [ V gives the ratio of chromosome distribution between each operator. 4.2.1 Crossover probability ( pc ). The adaptive strategy proposed in this paper uses regular feedback from the GA in the form of Genetic Events (Section 4) to update and adapt crossover probability. The theory works on the principle that crossover is initially applied at an elevated rate in order to exploit the positive traits present in the population; this is above all essential when dealing with large population sizes as crossover dramatically outperforms mutation in these instances (Spears and Anand, 1991). However, as the population begins to converge on the optimal the crossover operator gradually becomes less effective due to the lack of diversity; therefore a reduced crossover probability and a mutation probability of 0.25/L (Section 6.1 identifies 0.25/L as a superior mutation probability) may serve to refine the gradually improving fitness of each chromosome. Studies carried out by Tuson and Ross (1996) support this belief; they show that a low crossover probability gives high quality results while a higher probability results in a faster search. Therefore, given a fitness function with a relatively straight path to the optimal, it is believed that an effective GA should have an initial high speed of convergence followed by fine tuning of the solution in later generations. The adaptive strategy proposed here will update pc every 100 generations. Various generational gaps were tested; it was found that 100 yielded the best results. The crossover probability ( pc) for generations g to g þ 100 is derived by recording the occurrence of offspring produced in every generation from g 2 100 to g which are superior to the elite (fittest individuals) in the previous generation divided by the population size (n). pc ¼

frequency of superior offspring population size

Landscape classification

ð3Þ

Based on the operator’s performance, its crossover probability will be increased or decreased every 100 generations. The crossover probability pc is never allowed to exceed a value of 1 or drop below 0.2. 4.2.2 Participation level ðlVx Þ. Spears (1995a, b) highlights the fact that various crossover operators’ performance can be affected by other parameters such as population size. Also, certain operators can be more affective than others at alternating stages of the GA. The structured GA proposed here allows the simultaneous application of several crossover operators to the same problem working on the assumption that the most suitable will dominate the mating process. The adaptive technique proposed here calls for the simultaneous application of two crossover operators, single point crossover (SPC) and uniform crossover (UC), where the choice of operators is based on the nature of operation. SPC is essentially

1479

K 34,9/10

1480

an exploitative operator, which is used to take advantage of the positive traits within each individual while UC is an explorative operator, which acts as a more global search operator. Therefore, utilising the distinct attributes of both operators should provide a more robust search through simultaneously exploring the search space on both a local and a global level. The technique used by Spears (1995a, b) is based on the same principle although his method differs in that he appends a “bit-tag” to each chromosome to determine which operator is to be used for each individual recombination. Spears also differs in his choice of operators, opting for two point crossover (TPC) and UC. His decision was based on a study he conducted on the disruptive nature of various operators (Spears, 1995b), TPC being the least disruptive and UC being the most disruptive. The choice to use SPC as opposed to TPC for the adaptive mechanism proposed in this paper is due to the fact that preliminary trials using SPC and TPC individually with static parameters have indicated that SPC yields enhanced results. The relative frequency by which a certain crossover operator has produced positive results in previous generations is taken as an indication of its likely performance in the future. As a result, the percentage of the population submitted to this type of crossover ðlVx Þ is increased or decreased accordingly. Each time offspring is created its fitness is evaluated and the crossover operator used to produce the offspring will be duly awarded by increasing its participation level ðlVx Þ: Samples are taken through generational gaps of 100. If any one operator from V proves to perform significantly better than its counterparts it will tend to dominate the mating process. However, no operator’s participation level will ever be allowed to drop to zero ensuring that it is continually fighting to regain dominance. Each operator’s participation level lVx is derived by calculating the ratio of superior offspring C Vx produced by a crossover operator Vx [ V over 100 generations. The sum of all operators’ participation level is 1.

lV1 þ lV2 þ · · · þ lVx ¼ 1

ð4Þ

Given two operators V1 and V2 the participation level for V1 is given by:

lV1 ¼

nR1 R1 þ R2

ð5Þ

where R1:R2 is the ratio of the performance of V1 and V2 over 100 generations and n is the population size. If both crossover operators fail to produce offspring superior to the elite from generation g 2 1; their individual crossover probabilities will be determined using their overall performance to date. This ensures that the most powerful crossover operator can compete for the highest level of participation. This is illustrated in Figure 5 where the ratio of V1 : V2 : V3 can be seen to change. Assuming that the mean fitness mC 3 for C3 has grown with respect to mC 1 ; then pV3 at time t will have increased while pV1 will have decreased as illustrated below Figure 5b. 5. Selected test functions for optimisation All adaptive techniques proposed in Section 3 were tested against a suite of test problems F1-F6, which were chosen due to their varying complexity and characteristics. Each of the selected functions pose different challenges, for example,

Landscape classification

1481

Figure 5. Illustration of adaptive participation level

how does the GA deal with plateaus, multimodal problems, noise, low to high modulation of local optimal, low to high dimensionality. They have been chosen in order to demonstrate that the various adaptive techniques proposed in Section 3 can be utilised to overcome these obstacles and improve performance. F1-F6 are function optimisation problems (Table I) using a binary encoding scheme. The first five test functions were compiled by DeJong (1975), F6 was first introduced by Rastrigin (1974) and was generalized by Mu¨hlenbein et al. (1991). Figures 6-11 give a two-dimensional representation of functions F1-F6. Functions F1 (Sphere) and F2 (Rosenbrock’s Saddle) are low-dimensional, unimodal, quadratic functions. F1 is smooth, strongly convex and symmetric and is essentially

K 34,9/10

Name F1

Function F1ðXÞ ¼

Limits 3 X

X 2i ;

Reference

25:11 # X i # 5:11

(6)

22:047 # X i # 2:047

(7)

25:11 # X i # 5:11

(8)

21:27 # X i # 1:27

(9)

265:535 # X i # 65:535

(10)

25:11 # X i # 5:11

(11)

i¼1

F2

1482

F3 F4

F2ðXÞ ¼ 100* ðX 21 2 X 2 Þ2 þ ð1 2 X 1 Þ2 ; 5 X IntegerðX i Þ; F3ðXÞ ¼ F4ðXÞ ¼

i¼1 30 X

iX 4i þ GAUSSð0; 1Þ;

i¼1

F5

1 ¼ 0:002 þ F5ðXÞ " ½aij ¼

Table I. Test functions F1-F6

F6

25 X j¼1

1 jþ

2 X

; 6

ðX i 2 aij Þ

i¼1

232;216;0;16;32;232;216;...;0;16;32

#

232;232;232;232;216;216;...;32;32;32

F6ðXÞ ¼ 20A þ A ¼ 10

20  X

 X 2i 2 10 cosð2pX i Þ ;

i¼1

Figure 6. F1 (Sphere)

quite a simple problem. F2 is a more difficult function to optimise as the minimum is situated in a long, narrow, parabolic-shaped valley with a flat bottom. F3 (Step) is a discontinuous, unimodal step function of moderate dimension which is piecewise constant. F4 (Quartic) is a continuous, unimodal, high-dimensional quadratic function padded with Gaussian noise. The Gaussian noise ensures that each individual’s fitness value changes from generation to generation (i.e. each evaluation). F5 (Shekel’s Foxholes) is a multimodal, continuous, non-convex, non-quadratic, two-dimensional function with 25 local minima in the form of spikes.

Landscape classification

1483

Figure 7. F2 (Rosenbrock’s Saddle)

Figure 8. F3 (Step)

F6 (Rastrigin) is a multimodal, 20-dimensional problem with a large search space. F6 is an extension of F1 with the addition of cosine modulation causing a large number of local optima. 6. Results This section begins by conducting an independent review and analysis of the static parameters mentioned in Section 1 in order to identify a valid benchmark against which the adaptive techniques presented in Section 4 can be evaluated. Results are then

K 34,9/10

1484

Figure 9. F4 (Quartic)

Figure 10. F5 (Shekel’s Foxholes)

presented for each of the adaptive techniques proposed in Section 4, highlighting any marked improvement over results obtained using the best set of static GA parameters. All results were obtained from the selection of six test functions presented in Section 5 and give the average number of generations taken to converge on the optimal (i.e. the global minimum) over 100 runs. Performance is measured by observing the percentage of optimal found during the 100 runs (optimal found). If the optimal is found for each run of the GA, the number of generations taken is used (average no. generations) and if the optimal is never found, the average fitness over 100 runs is used (average fitness).

Landscape classification

1485

Figure 11. F6 (Rastrigin)

6.1 Static parameters First, the parameters proposed by DeJong (Population size: 50, Crossover probability: 0.6, Mutation probability: 0.001), Grefenstette online ( ps: 30, pc: 0.95, pm: 0.01) and Grefenstette offline ( ps: 80, pc: 0.45, pm: 0.01) were implemented and evaluated against the six test functions. The results in Tables II-V give the average number of generations taken to converge to the optimal using the above parameters, respectively. Those highlighted show the best performance for each individual problem.

Function

Average fitness

Average number of generations

Optimal found (percent)

Maximum generations

pm

F1 F2 F3 F4 F5 F6

1 0.99999 0.99964 0.99080 0.99842 0.99457

184 481 747 3,160 1,619 4,107

100 9 98 0 30 0

500 1,000 1,000 5,000 3,000 5,000

0.001 0.001 0.001 0.001 0.001 0.001

Function

Average fitness

Average number of generations

Optimal found (percent)

Maximum generations

pm

F1 F2 F3 F4 F5 F6

1 0.99999 1 0.98952 0.99967 0.98744

128 512 280 2,716 721 4,569

100 28 100 0 39 0

500 1,000 1,000 5,000 3,000 5,000

0.01 0.01 0.01 0.01 0.01 0.01

Table II. Results obtained using DeJong’s recommended parameters

Table III. Results obtained using Grefenstette’s recommended online parameters

K 34,9/10

1486

Table IV. Results obtained using Grefenstette’s recommended offline parameters

It is clear from these results that Grefenstette’s recommended parameters perform significantly better for low/moderate chromosome sizes while DeJong’s are better suited to large chromosome sizes. One possible explanation for such a variation in performance lies in the choice of mutation probability. For functions F1-F3 and F5 Grefenstette’s mutation rate of 0.001 provides adequate diversity combined with a certain degree of fine tuning or hill-climbing while DeJong’s high mutation probability of 0.01 proves to be too disruptive. However, F4 and F6 which have larger chromosome sizes require a much higher probability of mutation in order to maintain adequate diversity. Muhlenbein (1992) recommends a mutation probability of 1/L, where L is the chromosome length; others have also produced positive results using this equation (Ba¨ck, 1996). By updating the parameters above with this new mutation probability results have averaged out considerably giving relatively fit results for all test functions as seen in Tables V-VII. Results from Tables II-IV are given in parenthesis for comparison. Highlighted rows indicate those which show an improvement using a mutation probability of 1=L:

Function

Average fitness

Average number of generations

Optimal found (percent)

Maximum generations

pm

F1 F2 F3 F4 F5 F6

1 0.99999 1 0.98957 0.99978 0.98833

122 495 181 2,881 1,687 4,259

100 19 100 0 33 0

500 1,000 1,000 5,000 3,000 5,000

0.01 0.01 0.01 0.01 0.01 0.01

Function Table V. Results obtained using DeJong’s recommended parameters with pm ¼ 1=L

F1 F2 F3 F4 F5 F6

Function Table VI. Results obtained using Grefenstette’s recommended online parameters with pm ¼ 1=L

F1 F2 F3 F4 F5 F6

Average fitness 1 0.99999 0.99981 0.99026 0.99982 0.99342

(1) (0.99999) (0.99964) (0.99080) (0.99842) (0.99157)

Average fitness 0.99999 0.99999 0.99873 0.99031 0.99969 0.99435

(1) (0.99999) (1) (0.98952) (0.99967) (0.98744)

Average number of generations 162 527 252 3,056 1,594 4,054

(184) (481) (747) (3160) (1619) (4107)

Average number of generations 167 576 260 2,881 970 4,220

(128) (512) (260) (2716) (721) (4569)

Optimal found (percent) 100 14 98 0 41 0

(100) (9) (98) (0) (30) (0)

Optimal found (percent) 99 11 93 0 40 0

(100) (28) (100) (0) (39) (0)

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.033 0.04 0.02 0.004 0.03 0.005

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.033 0.04 0.02 0.004 0.03 0.005

As expected the improvement is to low/moderate chromosome size problems when using DeJong’s parameters with a modified mutation probability of 1/L and for large chromosome problems when using Grefenstette’s parameters. Further analysis of the results presented in Tables II-IV (in parenthesis) in comparison to the corresponding results in Tables V-VII suggests that a mutation probability of 0.25/L may in fact generate superior results. Results presented in Tables VIII-X support this theory showing increased performance when applied to almost all test functions. Although, 0.25/L is not the ideal mutation probability for each problem, the increased performance for the majority of test problems identifies 0.25/L as a superior, non-problem specific mutation probability.

Function F1 F2 F3 F4 F5 F6

Function F1 F2 F3 F4 F5 F6

Function F1 F2 F3 F4 F5 F6

Average fitness 1 0.99999 0.99963 0.99004 0.99991 0.99397

(1) (0.99999) (1) (0.98957) (0.99978) (0.98833)

Average fitness 1 0.99999 1 0.99080 0.99965 0.99456

(1) (0.99999) (0.99981) (0.99080) (0.99982) (0.99456)

Average fitness 1 0.99999 0.99945 0.99056 0.99964 0.99161

(1) (0.99999) (1) (0.99031) (0.99969) (0.99435)

Average number of generations 136 544 250 3,039 1,764 3,918

(122) (459) (181) (2881) (1687) (4259)

Average number of generations 125 483 201 3,160 1,240 4,021

(162) (481) (251) (3160) (1594) (4107)

Average number of generations 115 512 287 2,973 676 800

(128) (512) (280) (2716) (970) (4220)

Optimal found (percent) 100 16 98 0 44 0

(100) (19) (100) (0) (33) (0)

Optimal found (percent) 100 16 100 0 13 0

(100) (14) (98) (0) (41) (0)

Optimal found (percent) 100 28 97 0 25 0

(100) (28) (100) (0) (40) (0)

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.033 0.04 0.02 0.004 0.03 0.005

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.008 0.01 0.005 0.001 0.007 0.001

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.008 0.01 0.005 0.001 0.007 0.001

Landscape classification

1487

Table VII. Results obtained using Grefenstette’s recommended offline parameters with pm ¼ 1=L

Table VIII. Results obtained using DeJong’s recommended parameters with pm ¼ 0:25=L

Table IX. Results obtained using Grefenstette’s recommended online parameters with pm ¼ 0:25=L

K 34,9/10

1488

The increased performance observed using a mutation rate of 0.25/L is owing to the possible loss of good allele values introduced to various genes within a chromosome by crossover operations. The highlighted rows in Tables VIII-X show the improvement using 0.25/L. A mutation probability of 1/L implies that an average of one gene in every individual will be mutated. In a situation where crossover is producing offspring of increased fitness this could result in the possible loss of good alleles. Therefore, a mutation probability of 0.25/L preserves high-quality allele values introduced during crossover while still maintaining adequate population diversity. The results obtained using the best initial parameters for each individual problem (Table XI) with a modified mutation probability of 0.25/L shall be used as a test benchmark except in the case of F5 where Grefensette’s recommended offline parameters with a modified mutation probability of 1/L shall be used. 6.2 Crossover probability ( pc ) Owing to the fact that adaptation of the crossover probability (ACP) is employed to speed up convergence for problems with a relatively clear path to the optimal, only F1 and F3 were tested for crossover probability. Table XII gives the results obtained (for F1 and F3) using the adaptive technique proposed in Section 4.2.1 for comparison with the best static parameters identified in Tables IX and X, respectively (given in parenthesis). Note: initial parameters for the adaptive algorithm are ps: 50, cp: 1.0, mp: 0.25/L. Table XII shows that adaptive crossover probability works up to 10 percent better for the selected problems. This can be explained by the fact that the continuous feedback from the GA to the RS with each generation helps determine how effective crossover is in producing superior offspring and consequently it is capable of estimating a more productive probability value.

Function Table X. Results obtained using Grefenstette’s recommended offline parameters with pm ¼ 0:25=L

F1 F2 F3 F4 F5 F6

Function

Table XI. Best result for each problem

F1 F2 F3 F4 F5 F6

Average fitness 1 0.99999 1 0.99080 0.99973 0.99559

(1) (0.99999) (1) (0.99004) (0.99991) (0.99397)

Average number of generations 121 495 176 3,028 1,620 3,940

Optimal found

(122) (495) (181) (3039) (1764) (3918)

100 19 100 0 22 0

Average fitness

Average number of generations

Optimal found (percent)

1 0.99999 1 0.99080 0.99991 0.99559

115 512 176 3,028 1,764 3,940

100 28 100 0 44 0

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.008 0.01 0.005 0.001 0.007 0.001

(100) (19) (100) (0) (44) (0)

Parameters Gref Gref Gref Gref Gref Gref

Online Offline Offline Offline Offline Offline

pm 0.25/L 0.25/L 0.25/L 0.25/L 1/L 0.25/L

(0.008) (0.01) (0.005) (0.001) (0.03) (0.001)

6.3 Participation level ðlVx Þ Unlike adaptive crossover probability, adaptive participation level (APL) can be tested for all binary encoded problems. Although adaptation does not begin until after 100 generations the inclusion of two crossover operators, single point and uniform, does prove beneficial as shown in Table XIII. This improved performance can be attributed to the nature of the operators being used (Section 4.2.2) and the fact that the route through the search space is not limited to entirely local or entirely global search as with the classic form of GA employing only one crossover operator. The previous best results using static parameters are given in parenthesis.

Landscape classification

1489

6.4 Simultaneous adaptive techniques The simultaneous application of all adaptive techniques yields some interesting results. An analysis of the results obtained using all three forms of adaptation in comparison with the best static parameters, can be seen in Table XIV. It shows that the application of the adaptive rule-based RS yields a considerable increase in performance over the best static parameters for complex problems. Although the improvement for

Function F1 F3

Function F1 F2 F3 F4 F5 F6

Function F1 F2 F3 F4 F5 F6

Average number of generations

Average fitness 1 (1) 1 (1)

101 159

Average fitness 1 0.99999 1 0.99060 0.99985 0.99565

(1) (0.99999) (1) (0.99056) (0.99991) (0.99559)

Average fitness 1 1 1 0.99955 1 1

(1) (0.99999) (1) (0.99056) (0.99991) (0.99559)

(115) (176)

Average number of generations 113 495 144 2,916 1,911 4,165

(115) (512) (176) (2973) (1687) (3918)

Average number of generations 86 2,483 160 7,420 2,315 2,487

(115) (512) (176) (2916) (1764) (3940)

Optimal found (percent) 100 100

(100) (100)

Optimal found (percent) 100 30 100 0 60 0

(100) (28) (100) (0) (44) (0)

Optimal found (percent) 100 100 100 0 100 100

(100) (28) (100) (0) (44) (0)

Maximum generations

pm

500 1000

0.008 0.005

Maximum generations

pm

500 1,000 1,000 5,000 3,000 5,000

0.008 0.01 0.005 0.001 0.007 0.001

Maximum generations

pm

N/A N/A N/A N/A N/A N/A

0.008 0.01 0.005 0.001 0.007 0.001

Table XII. Adaptive crossover probability for F1 and F3 vs adaptive participation level

Table XIII. Optimal static parameters vs adaptive participation level

Table XIV. Optimal static parameters vs adaptive participation level

K 34,9/10

1490

F1 and F3 are not considered to be substantial by the authors, this can be attributed to the fact that they are simple problems with a relatively smooth path to the optimal. In contrast, the improvement observed for functions F2, F3 and F6 is quite significant, with the percentage of times the optimal was found having increased from 28, 44 and 0 percent, respectively, using static parameters to 100 percent in each instance using the AGA. Typical learning profiles for F2 and F5 can be seen in Figures 12 and 13. These learning profiles can be used to further illustrate the invocation of the adaptive rules presented in Section 4. The top graph in each learning profile gives information on the state of the population at each generation in the form of the fitness value of the strongest individual and a measure of population diversity as specified in Section 4.1.2. The bottom graph represents the GA’s CVR (Section 4.1.1) and is used to determine the direction of search at any specific generation and indeed whether the population is converging or not. It should be noted, that although the number of generations taken to converge on the optimal has dramatically increased for all problems, the additional runtime would be of no benefit to a static parameter GA. This is due to the fact that, once a static parameter GA converges on a non-global optimal the population generally does not contain the schemata or the domain knowledge to re-instigate convergence. The results presented in Table XIV demonstrate the importance of domain knowledge and landscape classification when applying genetic algorithms to individual problems where the path to the optimal is perhaps obstructed. F4 is the only function, which was not optimised using the rule-based adaptation; the authors suggest that this is due to the addition of Gaussian noise to the function. Future research will investigate further noise reduction. 7. Conclusion In addressing the issues identified in Section 2.4 (i.e. the implications of the NFLT as regards portability and premature convergence) this paper identifies an optimal static mutation probability of 0.25/L for the GA structure detailed in Section 3.1, where L is the chromosome length. Also, two new adaptive crossover techniques are presented, “Adaptive Crossover Probability” which is used to increase convergence velocity for unimodal problems and “Adaptive Participation Level” which is used to regulate between local and global search. Both adaptive techniques and the updated mutation probability present improved performance over the traditional static parameter GA. However, the main body of work produced as a result of this research is the creation of a new rule-based architecture specification for implementing an Adaptive Genetic Algorithm (GA) system where each individual GA process (when extended to include parallel GAs) is merged with an abstract reasoning module capable of making critical adaptation decisions at runtime. It is this reasoning module which makes all decisions based on the adaptive crossover techniques highlighted in Section 4.2 and the instantiation criteria described in Section 4.1. This paper demonstrates the effectiveness of landscape classification and consequent rule-based reasoning for GAs, particularly for problems with a difficult path to the optimal; it is for these problems that the improvement is most evident showing an increase from 0 to 100 percent over static parameter GAs for one of the more difficult test cases F6. It should be noted that the additional development time required to implement such a system does not justify the results obtained for

Landscape classification

1491

Figure 12. Typical learning profile for F2

K 34,9/10

1492

Figure 13. Typical learning profile for F5

low-dimensional problems if it is known that the problem is unimodal with a straight path to the optimal. It is evident from the results presented in Table XIV that landscape classification and the subsequent problem specific adaptation of GA parameters and structures is extremely beneficial. Although genetic algorithms have proven to be exceptionally robust in the past, the implications of the NFLT in relation to transferability of GAs from one class of problem to another has been a difficulty faced by GA developers since their inception by Holland in the 1970s. Further development of the landscape classification and rule-based invocation module in conjunction with new and existing adaptive techniques (such as ACP and APL) appears extremely promising. Given an accurate and robust mechanism for identifying characteristics of various evaluation functions it should be possible to incorporate rules to guide the optimisation in an increasingly varied domain of application. References Angeline, P.J. (1995), “Two self-adaptive crossover operations for genetic programming”, Advances in Genetic Programming II, MIT Press. Ba¨ck, T. (1992a), “The interaction of mutation rate, selection, and self-adaptation within a genetic algorithm”, Parallel Problem Solving from Nature 2, pp. 87-96, PPSN-II. Ba¨ck, T. (1992b), “Self-adaptation in genetic algorithms”, Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life, MIT Press, Cambridge, MA, pp. 263-71. Ba¨ck, T. (1996), Evolutionary Algorithms in Theory and Practice, Oxford University Press, Oxford. Darwin, C. (1859), The Origin of Species, Oxford University Press, Oxford (reissue edition 1998). Davis, L. (1989), “Adapting operator probabilities in genetic algorithms”, Proceedings of the Third International Conference on Genetic Algorithms (ICGA III), pp. 61-9. Davis, L. (1991), Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, NY. Deb, K. and Beyer, H.G. (1999), “Self-adaptation in real-parameter genetic algorithms with simulated binary crossover”, Proceedings of the Genetic and Evolutionary Computation Conference, Morgan Kaufmann, Orlando, Florida, FL, Vol. 1, pp. 172-9. DeJong, K. (1975), “An analysis of the behavior of a class of genetic adaptive systems”, PhD dissertation, San Mateo, CA. DeJong, K.A. (1985), “Genetic algorithms: a 10 year perspective”, Proceedings of the First International Conference on Genetic Algorithms and Their Applications, pp. 169-77. DeJong, K.A. and Spears, W.M. (1991), “An analysis of the interacting roles of population size and crossover in genetic algorithms”, Parallel Problem Solving from Nature – Proceedings of 1st Workshop, {PPSN} 1, 496, Springer, Berlin, pp. 38-47. Eiben, A.E., Hinterding, R. and Michalewicz, Z. (2000), “Parameter control in evolutionary algorithms”, IEEE Transactions on Evolutionary Computation, Vol. 3 No. 2, pp. 124-41. Eiben, A.E., Sprinkhuizen-Kuyper, I.G. and Thijssen, B.A. (1998), “Competing crossovers in an adaptive GA framework”, IEEE World Congress on Computational Intelligence, The 1998 IEEE International Conference on Evolutionary Computation Proceedings, pp. 787-92. Espinoza, F.P., Minsker, B.S. and Goldberg, D.E. (2001), “A self adaptive hybrid genetic algorithm”, Proceedings of the Genetic Evolutionary Computation Conference (GECCO 2001), Morgan Kaufmann, San Francisco, CA.

Landscape classification

1493

K 34,9/10

1494

Fogarty, T.C. (1989), “Varying the probability of mutation in the genetic algorithm”, Proceedings of the Third International Conference on Genetic Algorithms, pp. 104-9. Goldberg, D. (1989), “Genetic algorithms in search”, Optimization, and Machine Learning, Addison-Wesley, Reading, MA. Goldberg, D. and Lingle, R. (1985), “Alleles, loci, and the traveling salesman problem”, Proceedings of the 1st International Conference on Genetic Algorithms and their Applications, pp. 154-9. Grefenstette, J.J. (1986), “Optimization of control parameters for genetic algorithms”, IEEE Transactions on Systems, Man, and Cybernetics, SMC, Vol. 16 No. 1, pp. 122-8. Herrera, F. and Lozano, M. (1996), “Adaptive genetic algorithms based on fuzzy techniques”, Proceedings of IPMU’96, Granada, pp. 775-80. Hinterding, R. (1997), “Self-adaptation using Multi-chromosomes”, Proceedings of the 4th IEEE International Conference on Evolutionary Computation, IEEE Press, pp. 87-91. Hinterding, R., Michalewicz, Z. and Eiben, A.E. (1997), “Adaptation in evolutionary computation: a survey”, Proceedings of the IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence. Holland, J.H. (1975), Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press, Cambridge, MA. Julstrom, B.A. (1995), “What have you done for me lately? Adapting operator probabilities in a steady-state genetic algorithm”, Proceedings of the Sixth International Conference on Genetic Algorithms, pp. 81-7. Levenick, J.R. (1995), “Metabits: generic endogenous crossover control”, Proceedings of the Sixth International Conference on Genetic Algorithms, Morgan Kaufmann, San Francisco, CA, pp. 88-95. MacGiollaBhride, F., McGinnity, T.M. and McDaid, L.J. (2003), “Adaptive reasoning for genetic algorithms: a new strategy”, paper presented at: 2nd IEEE Systems, Man & Cybernetics UK & RI Conference on Cybernetic Intelligence, Challenges and Advances, pp. 45-50. Martin, O., Gras, R., Hernandez, D. and Appel, R.D. (2003), “Optimizing genetic algorithms using self-adaptation and explored space modelization”, paper presented at The Fifth International Workshop on Frontiers in Evolutionary Algorithms, pp. 291-4. Muhlenbein, H. (1992), “How genetic algorithms really work: I. Mutation and hillclimbing”, Parallel Problem Solving from Nature 2, pp. 15-29. Mu¨hlenbein, J., Schomisch, U. and Born, J. (1991), “The parallel genetic algorithm as function optimizer”, Parallel Computing, Vol. 17, pp. 619-32. Rastrigin, L.A. (1974), Systems of Extremal Control, Nauka, Moscow. Rechenberg, I. (1973), “Evolutionsstrategie: optimierung technischer systeme nach prinzipien der biologischen evolution”, Frommann-Holzboog Verlag, Stuttgart. Rogers, A. and Pru¨gel-Bennett, A. (2000), “The dynamics of a genetic algorithm on a model hard optimization problem”, Complex Systems, Vol. 11 No. 6, pp. 437-64. Schaffer, J.D. and Morishima, A. (1987), “An adaptive crossover distribution mechanism for genetic algorithms”, Genetic Algorithms and their Applications: Proceedings of the Second International Conference on Genetic Algorithms, pp. 36-40. Schaffer, J.D., Caruana, R., Eshelman, L.J. and Das, R. (1989), “A study of control parameters affecting the online performance of genetic algorithms for function optimization”, Proceedings of the 3rd International Conference on Genetic Algorithms, pp. 51-60.

Smith, J. and Fogarty, T.C. (1996), “Self adaptation of mutation rates in a steady state genetic algorithm”, paper presented at International Conference on Evolutionary Computation, pp. 318-23. Spears, W.M. (1995a), “Adapting crossover in evolutionary algorithms”, Proceedings of the Fourth Annual Conference on Evolutionary Programming, MIT Press, Cambridge, MA, pp. 367-84. Spears, W.M. (1995b), “Adapting crossover in a genetic algorithm”, Proceedings of the Fourth Annual Conference on Evolutionary Programming, MIT Press, Cambridge, MA, pp. 367-84. Spears, W.M. (1997), “Recombination parameters”, The Handbook of Evolutionary Computation. Spears, W.M. and Anand, V. (1991), “A study of crossover operators in genetic programming”, Proceedings of the Sixth International Symposium on Methodologies for Intelligent Systems, pp. 409-18. Srinivas, M. and Patnaik, L.M. (1994), “Adaptive probabilities of crossover and mutation in genetic algorithms”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 24 No. 4, pp. 656-67. Stracuzzi and David, J. (1998), “Some methods for the parallelization of genetic algorithms”, available at: www.cs.umass.edu/ , stracudj/genetic/dga.html Syswerda, G. (1989), “Uniform crossover in genetic algorithms”, Proceedings of the Third International Conference on Genetic Algorithms, Morgan-Kaufmann, San Mateo, CA, pp. 2-9. Tuson, A. and Ross, P. (1996), “Cost based operator rate adaptation: an investigation”, Parallel Problem Solving from Nature – PPSN IV, Springer, Berlin, pp. 461-9. Tuson, A. and Ross, P. (1998), “Adapting operator settings in genetic algorithms”, Evolutionary Computation, Vol. 6 No. 2, pp. 161-84. Wolpert, D.H. and Macready, W.G. (1995), “No free lunch theorems for search”, Tech. Rep, Santa Fe, NM. Wolpert, D.H. and Macready, W.G. (1997), “No free lunch theorems for optimization”, IEEE Transactions on Evolutionary Computation, Vol. 1 No. 1, pp. 67-82. Wu, Q.H., Cao, Y.J. and Wen, J.Y. (1998), “Optimal reactive power dispatch using an adaptive genetic algorithm”, International Journal of Electrical Power & Energy Systems, Vol. 20 No. 8, pp. 563-9. Yang, S. (2002), “Adaptive non-uniform crossover based on statistics for genetic algorithms”, Proceedings of the Genetic and Evolutionary Computation Conf., GECCO 2002, Morgan Kaufmann Publishers, San Francisco, CA, pp. 650-7. Yao, X., Liu, Y. and Liu, G. (1999), “Evolutionary programming made faster”, IEEE Transactions on Evolutionary Computation, pp. 82-102. Zhou, X., Cheng, H. and Ju, P. (2002), “The third-order induction motor parameter estimation using an adaptive genetic algorithm”, Proceedings of the 4th World Congress on Intelligent Control and Automation, Vol. 2, pp. 1480-4.

Landscape classification

1495

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

K 34,9/10

Business cybernetics: a provocative suggestion

1496

Faculty of Economics and Business, University of Maribor, Razlagova, Maribor, Slovenia

Vojko Potocan, Matjaz Mulej and Stefan Kajzer

Abstract Purpose – There is a field needing both cybernetics and systems theory: business as one way to viability – “business cybernetics” might have to emerge. The purpose of this paper is to address this issue. Design/methodology/approach – A first draft of business cybernetics (BC) notion is presented. Discusses the definition of business systems (BSs) and their need for requisite holism, our understanding of cybernetics, our understanding of the (general) systems theory and systems thinking, differences between some versions of systems theories and cybernetics, and add our draft cybernetics of BSs, finishing with BC as a case of interdependence between business practice, systems theories and cybernetics and resulting conclusions. Findings – It was not found, although quite some literature was studied and quite some practical experience in business, both as employees and as consulting instructors was collected. It is clear that cybernetics and (general) systems theory were created at about the same time by two different groups of scientists. They both dealt with complex rather than complicated entities/features/processes and they both tried to stress relations between parts of reality, which used to be considered separately and one-sidedly rather than (requisitely) holistically. Research limitations/implications – Later on, their “war against a too narrow specialisation” did not end in their general victory, but rather in application of their fruitful findings inside many specialised disciplines of science and practice. This is good, but not good enough, uncovered topics remain. Business is one of them. Originality/value – Links both cybernetics and systems to an emerging “business cybernetics” in an innovative approach. Keywords Cybernetics, Systems theory Paper type Research paper

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1496-1516 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614786

1. The problem and theses The new encyclopaedia (Franc¸ois, 2004) includes considerable information, but it does not mention business cybernetics (BC). Hence, we would like to contribute from our education, research, and experience (in business, organisation, systems and innovation theories) the following provocation concerning BC. Our axioms result from relevant literature and read as follows. . Systems thinking received support from cybernetics and systems theories after millennia of informal existence and help to successful humans and organisations to be requisitely holistic and therefore, successful. It was (and is) supposed to replace the modern exaggerated narrow specialisation and introduce a combination of specialisation and interdisciplinary cooperation. This combination may and should lead to a requisite holism and hence to success and survival. This contribution is based on the research program “From the institutional to the real transition into the innovative enterprise”, which enjoys the support from the Ministry of Education, Science and Sport, Republic of Slovenia, in 2004-2007.

.

.

.

.

.

There are some basic principles and methods of cybernetics and of systems theory. They may apply to anything, may support any research and help in any practical experience. They are general and provide meta-models, which need to be complemented with less general insights in order to work well in the practice of specialists and teams. In addition, there are some principles and methods of a number of more narrow specialisations in systems theory, such as hard and soft, traditional and modern, descriptive and methodological, etc. There are also some principles and methods of a number of more narrow specialisations in cybernetics including medical, engineering, economics, etc. There are application of principles and methods of cybernetics to single traditional disciplines of science and practice. They all meet both the practical needs and the law of requisite holism as well the general definition of a system in the general systems theory, but they do not meet the Bertalanffian concept of systems thinking as a totally holistic one (Bertalanffy, 1968/1979).

Their list does not seem to contain a holistic treatment of cybernetics in business (Franc¸ois, 2004; Trappl, 2002, 2004; Valle´e, 2003). We found no work on BC. On this basis, our theses read as follows: (1) Application of cybernetics and systems thinking to individuals and organisations as business systems (BSs) can and must be conceived more holistically, including the introduction of BC (see Sections 2, 3 and 7). (2) A systemic and cybernetic consideration of BSs, including results of such consideration, can and must be improved, if the interdependence of systems theory and cybernetics is to be given more attention. This may include cybernetics and systems theory in general and/or in their specialised disciplines, including BC (see Sections 4-6). (3) A requisite holism of the cybernetic and systemic consideration of the business reality can and must result from findings from and consideration of the interdependence between business practice, systems thinking, and cybernetics (see Sections 5 and 6). Business is what we investigate with systems thinking as the approach; thus, we try to make our investigation requisitely holistic. Moreover, we use cybernetics to influence the business processes. Business practice, on the other hand, provides for advice on how broadly the requisite holism can be defined in any given real situation/case, and how complex measures of cybernetics are needed for the requisite variety to be met as well. 2. Business systems and their need for requisite holism The modern business conditions (e.g. harsh competition, permanent innovation and turbulence, globalisation, immense differences in economic development and culture) require humans to understand and define the individuals, enterprises, and all other organisations as BSs. This is one of the many possible viewpoints, which we have chosen in this contribution. BS means that we concentrate on the segment of organisational etc. attributes, which has do to with business, i.e. humans relations with

Business cybernetics

1497

K 34,9/10

1498

other people and organisations to attain their own business goals, such as profit, diminishing cost, satisfying needs with the available or produced sources, management, organisation of work processes of all kinds, etc. We are trying to consider them on a requisitely holistic basis. Hence, we wish to take into account the realistic holism (located somewhere between total one-sidedness, requisite holism and total holism), entanglement (located somewhere between total simplicity, requisite entanglement and total entanglement), and dynamics (located somewhere between total static, requisite dynamics and total dynamics) (Kajzer and Mulej, 1996; Mulej and Kajzer, 1998; Mulej, 2000; Potocan, 2003; Rebernik and Mulej, 2000). We chose requisite holism in order to help BSs as users of the BC to prevent their threatening failure, which may result from their one-sided specialisation and their lack of consideration of the law of requisite holism, requisite variety and other laws that contain the attribute “requisite” (Bausch, 2003; Christakis and Bausch, 2003; Mulej, 2000; Mulej et al., 2004; Potocan and Mulej, 2003; Potocan et al., 2003). They can be well linked by the law of requisite holism. The BSs seem to be simple and un-contradictory only, if one considers them as isolated and static features. This would be a poorly realistic, over-simplified approach, which would normally lead to one-sided, very partial, and limited, rather than requisitely holistic insights decisions and actions. Thus, a (dialectical) systems approach is necessary for results to be usable, useful and beneficial. A dialectical system supports the requisite holism; it consists of the crucial viewpoints and their crucial relations (Mulej et al., 2000; and earlier, since 1974). Consequently, all crucial specialists are on the team, but not one is the only one to matter. The BSs’ relations concerning their survival in competition dictate what attributes enter the dialectical system/network of the selected crucial viewpoints. It also requires us as both researcher and BSs’ managers, to take an untraditional methodological approach and methodology in order to research and manage appropriately ways toward requisite holism, when dealing with entanglement (both complex and complicated) and dynamics. We might call this content, the approach to BSs and methodology of working on it, BC. This means that we see BC as a specific, requisitely holistic, version of cybernetics, applying the general notions of cybernetics and the specific ones of organisational cybernetics to entities, which we have defined as BSs above, briefly (details will follow later on). How can one do it? Interaction of systems theory and cybernetics may present a possible solution (Wiener, 1948, 1956; Ashby, 1956; Kajzer and Kavkler, 1987; Potocan et al., 2003). How do we understand them? 3. Our understanding of cybernetics The term cybernetics comes from the Greek word kybernetes (meaning steersman, governor or rudder) (Ayto, 1993; Funk, 1992; Gove, 2002: Plato, 1955, 1971). In the late 1700s, James Watt’s steam engine had a governor, a simple feedback mechanism, actually a cornerstone of cybernetics theory (Funk, 1992). In 1868, James Clark Maxwell published an article on governors (Ayto, 1993). In the 1940s, the study of regulatory processes became a continuing research effort and two key articles were published in 1943 (McCulloch and Pitts, 1943; Rosenblueth et al., 1943).

Cybernetics as a discipline was firmly established by Norbert Wiener and others (see development of cybernetics: Ashby, 1956; Beer, 1975; Clemson, 1984; Checkland, 1981; Delgado and Banathy, 1993; Foerster, 1974; Trappl, 1983; Wiener, 1948; Wiener and Masani, 1976; Wood, 2000; Zadeh, 1965; Zadeh and Kacprzyk, 1992; etc). Because the field is still young, there are many different definitions of/approaches to cybernetics (Franc¸ois, 1999, 2004; Valle´e, 2003). Different approaches are based on applications of different starting points for their definition (e.g. ways of consideration, viewpoint selection, levels of insight, areas to be investigated, etc.). The basic characteristic of the past development of cybernetics can be found in the literature (Franc¸ois, 1999; 2004; Valle´e, 2003). The early contributions of cybernetics were mainly technological, and gave rise to feedback control devices, communication technology, automation of production processes and computers. Interest moved to numerous sciences involving humans, applying cybernetics to processes of cognition (practical pursuits such as psychiatry, development of information, management, government), and to efforts to understand complex forms of social organisation including communication and computer network. The full potential of cybernetics has not yet been realized in these applications. Finally, cybernetics is making inroads into philosophy. This started by providing a non-metaphysical teleology and continues by challenging epistemology and ethics with new ideas about limiting processes of the mind, responsibility and aesthetics. Before any science on impacting, which is called cybernetics now, there has been a practice of impacting, of course. For millennia, people have been managing their lives without theory. This fact would lead us to speaking about cybernetics of the zero order (Mulej, 1974; Mulej et al., 2000). Cybernetics as a science was originally created by Wiener et al. They were trying to create a synergy of knowledge of biology and technology to produce novelties found beneficial/useful and used by their users/consumers (Franc¸ois, 1999; Umpleby, 1990, 2001; Valle´e, 2003; Wiener, 1948, 1956). They were using the word system to denote that they were dealing with a complex and complicated feature as a whole rather than one-sidedly. In the 1970s, based on contribution by H. von Foerster, the distinction between the first and second order cybernetics was introduced (Foerster, 1974, 1981, 1987). The first concentrated on the observed object, the second on the observer and observing as a mutual impact between the observer and the object of observation. This was a revolution in science because the supposition of a total objectivity of science was given up and replaced with a more realistic one that includes the influential role of the observer. The quite logical next step was an attempt to impact the impacted person; it made room for Umpleby’s cybernetics of conceptual systems – there is a mutual impact between ideas and society, and cognition is at the same time a biological and a social feature (Umpleby, 1990, 2001; Umpleby and Dent, 1999). Cybernetics has to do with understanding biology and mastering machines on its basis, including automata, as well as with human thinking including specifics of individuals rather than observers, in general. Another line of development of cybernetics reaches back to 1951 and the work of Valle´e (2001, 2003). Its modern point is aimed at an explicit synthesis of observation, decision-making and impacting, stressing the phase/activity of decision-making. This is the basis for the notion of the third-order of cybernetics and epistemo-praxiology.

Business cybernetics

1499

K 34,9/10

1500

Cybernetics, hence, introduced complexity to science/practice concerning making an influence and attracting attention to relations, impacts and information, which opened human insight into the previously overseen attributes of reality. It helps humans control their own conditions of life a lot more efficiently and, hopefully, more holistically, too. On such a basis, one may conclude that cybernetics is a science and practice of influencing/controlling/managing features, events and processes that: (1) are complex or very complex, i.e. have multiple relations, internally and externally, and specific attributes resulting from these relations; (2) are open, i.e. have relations, especially interdependencies, with their environment/s, including the ones between different viewpoints; (3) are dynamic, i.e. able to change, including the observers, decision-makers and impacting actors, as well as the observation process; (4) take inputs as well as produce outputs ¼ impacts by information rather than by material/energy flows only; (5) support these flows by feedback loops, e.g. stabilize and simplify them by negative ones, and reinforce them by the positive ones; and (6) are mentally, explicitly or implicitly, modelled from the selected (set or system, or dialectical system, of) viewpoint/s. Cybernetics cannot be reduced to feedback loops or modelling alone, it takes all six attributes mentioned above as one synergetic whole, a dialectical system. The point of cybernetics is to help optimize the human impact over the human life and its circumstances, conditions and preconditions. Thus, cybernetics is one of many specialised disciplines that need to be requisitely holistic, and can hardly be so, if left alone rather than acting in interdisciplinary cooperation with other specialised disciplines. Hence, cyberneticians need systems thinking in both theory and application. In addition, there is a number of cybernetics applying the same basic ideas to different traditional fields of science and practice. Among them, we see BC. 4. Our understanding of the (general) systems theory and systems thinking At about the same time as the initial authors of cybernetics, L. v. Bertalanffy (LvB) was working with another group on a new worldview – the general systems teaching/theory (and related methodology supportive of making it happen) (Bertalanffy, 1968/1979; Bertalanffy and Rappoport, 1956-1972; Davidson, 1983). One of his crucial sentences says that humankind has a poor chance to survive if we do not think and behave as citizens of the world rather than single countries, and if we do not consider the entire biosphere as one whole. For this reason it is necessary to supplement and/or fight the modern exaggerated (!) specialisation, through interdisciplinary and isomorphism; there seems to be no other way toward survival (Bertalanffy, 1968/1979; Davidson, 1983; Ecimovic et al., 2002). We can conclude from LvB’s and other writings as well as from real life experience that a system is a mental picture of the existing object/topic under consideration. This mental picture is made by the author/observer/controller/manager of this object from his/her selected viewpoint/s in order to let the attributes of the object that he/she finds

the most important, be clearly visible. Thus, the system is not equal to the object it represents, and the model represents the system (because the system is the limit of one’s mental capacity and interest compared to reality with all (!) of its attributes). Due to this mental capacity, limited for natural reasons, we humans try to control/manage/create the world although we have a very limited insight into its reality; to overcome this natural lack of capacity, we need interdisciplinary creative cooperation of specialists who are mutually different and therefore, complementary. They attain best results, if they cooperate in the style of a dialectical system (Table I). Though, limitations of (systems) thinking inside single disciplines outplayed the original Bertalanffy (1968/1979, p. VII) conception to fight the exaggerated specialisation. In principle, one tends to consider a whole/entity, but one’s understanding of the whole may be quite different, depending on one’s selected viewpoint. Several authors created their versions of systems theories, grouped in traditional and modern ones, in hard- and soft-systemic ones, etc. (Eriksson, 2003; Jackson, 1991; Mueller-Merbach, 1992; Warfield, 2003). All these concepts obviously make sense in the real world, even if they deviate from Bertalanffy, as quoted above. No author mentions or even defines holism, although it is holism, which has been the reason for systems science to be established and accepted worldwide. This fact may say that systems science is in a crisis. Here, the selected viewpoint is obviously crucial again. It can cause observation, thinking, decision-making and action to be: . one-sided, limited to a single viewpoint; or . totally holistic, covering all attributes from all viewpoints and all their relations and synergies; or . something in between these two extremes, e.g. requisitely holistic, covering all essential viewpoints and all their essential relations and synergies. Thus we repeat and expand: if the human mental picture of the object under consideration includes a net/web/system (defined on the mathematics basis, i.e. as an ordered set) of all crucial, interdependent, viewpoints/systems (defined in terms of contents perceived from the selected viewpoint/s and therefore, covering selected parts of the existing attributes), we call it a dialectical system in Mulej’s dialectical systems theory (Elohim, 2001; Ecimovic et al., 2002; Mulej, 1974). This distinctions matters: the term system has up to 15 definitions of its contents in some vocabularies (Mulej et al., 2000). This makes the term unclear to the reader and requires the author to use a more concrete description of the contents. The brief findings on systems thinking and theories can be applied to BC in a number of ways. From the viewpoint of our research, one faces the question of the relation between different systems theories and the desired cybernetic consideration of BBs. 5. Differences between some versions of systems theories and cybernetics The (general) systems theory came into being because its authors intended to enable humans to give up their single-sidedness; toward this end, one should use formally equal notions (isomorphisms) from natural sciences, in all sciences (Delgado and Banathy, 1993). This concept has partially survived; it was not realistic and it no longer serves as a bridge between all professions as it was supposed to. This bridge is

Business cybernetics

1501

Table I. Reality (objects) in the three basic levels of simplification of its picture by humans

Model (presentation of the system)

Biggest

Object/reality None Dialectical system Small (as network/s. of mental pictures/s.) System (as mental picture of the Big object from a selected viewpoint)

All Viewpoints of considerations, taken as essential

Components considered

All Relations between viewpoints, taken as essential

Links/relations/interdependencies considered

Components of the object from Relations between components of the selected viewpoint the object from the selected viewpoint One single, simplified System simplified for System simplified for by modelling communication communication

One single

All All essential

Level of simplification Viewpoints in consideration considered

1502

Level of holism

K 34,9/10

actually best covered by such general sciences as mathematics and philosophy. They are mutually complementary because mathematics deals with quantitative properties and philosophy with qualitative (i.e. regarding contents) ones, and they both do so, on the level of general importance, which means properties are common to all features. In terms of the well-known dialectics’ delimitation of properties of features in their individual, special (i.e. group-specific) and general parts, we could make the following brief comments about some system theories (Figure 1). We may consider systems theories and cybernetics in terms of Figure 1. All of them are theories, which mean generalizations, but they are aimed at describing reality or helping people master reality, on different levels of abstraction. Let us exemplify!

Business cybernetics

1503

5.1 Individual parts of attributes 5.1.1 Many authors are so close to single traditional disciplines that they might belong to the individual part in Figure 1. E.g.:. . The living systems theory (Miller, 1978) helps us describe living beings of any kind, but only them. . The critical systems thinking (Jackson, 1987) deals explicitly with topics of management. . Close to this subject (with its topics of organisation) lies also the viable systems theory (Beer, 1975; Espejo and Harnden, 1989), as well as the one, which might perhaps be called hierarchical systems theory (Schiemenz, 1972). . The Autopoiesis (theory of self-developing systems) by Maturana deals with biological features able to evolve from their own (it is applied in psychology) (Maturana, 1980). 5.1.2 Similarly, a further set of systems theories could be located, but they would belong to hard systems like those mentioned above do. E.g.:. . The fuzzy systems theory (Zadeh, 1965; Zadeh and Kacprzyk, 1992) deals mostly with engineering problems, which can hardly be defined in a deterministic way realistically but it can also help explain democracy and successful cooperation with other applications. . Engineers use mathematical systems theory to describe those parts and viewpoints of reality they are specializing in. They call them linear systems, continuous systems, discontinuous systems and signals and adaptive systems, such as Fourier-transformation, Laplace-transformation, Z-transformation (Mulej et al., 2000). 5.1.3 The group-specific parts of attributes. . Checkland (1981) when introducing his soft systems methodology (SSM), made a clear distinction between the hard systems (useful in engineers’ design, construction of artefacts supposed to be very reliable) and the soft systems (useful in social sciences that deal with humans and the fact that they are no Figure 1. The three interdependent subsystems of attributes of anything in existence

K 34,9/10 .

1504

artefacts) (Checkland, 1981). SSM provides for the two big groups of systems theories to be delimited, which would fit in the special part in Figure 1. SSM does provide impact over the human personality, but is a tool to be used by humans. A number of other authors could also be located here (Franc¸ois, 2004).

5.1.4 The general parts of attributes. . General systems theory was established (Bertalanffy, 1968/1979) as an attempt to discover common attributes of every entity, be it a human being, a piece of land, or the universe. The point (Bertalanffy, 1968/1979) was to attack the exaggerated specialisation, which causes humans to ruin their own Earth and themselves. LvB helps humans think, decide and act less one-sidedly and with fewer oversights. Oversights endanger survival, as two World Wars and the World-Wide Economic Crisis between the two, in 1914-1945, had shown before the general systems theory was created. . General cybernetics was established at about the same time as GST. The purpose was to discover common attributes of influencing entities of any type (Umpleby, 1990, 2001; Umpleby and Dent, 1999). . The dialectical systems theory (DST) (Mulej, 1974) deals explicitly with human personality’s properties (as the subjective starting points of an intervention) along with Bertalanffian lines of thinking. In doing so, DST help humans both impact and enjoy support in finding commonalities and complements between mutually different professions. DST intends to build a bridge among them, which might bring us quite close to holism in consideration of any topic; hence, DST might belong to the general part like the general systems theory and general cybernetics. One can consider BSs on all three levels of abstraction in Figure 1, but once the general principles of cybernetics, general systems theory and the dialectical systems theory have been established, the BC might be located on the level of individual attributes (Figure 2). Thus, it would help humans consider the very specific attributes of BSs, which make them different from other systems/viewpoints with which one considers reality. Of course, inside the arena of BSs, one could again differentiate between the general attributes of the BSs and their group-specific attributes (such as the ones of humans, enterprises, government organisations, medical institutions, or schools). This experience demonstrates that the dialectics expressed in Figure 1 is recursive. We tried to research the business reality requisitely holistically based on systems thinking and systems theories. However, we are still not finding a satisfying answer to the central question of modern business: How to assure appropriate influence on business processes in terms of content and methods. If we wish to understand the role and importance of cybernetics in BSs, we must first define relations between cybernetics, systems theory and BSs; and possibilities of different cybernetics investigation of BSs.

Figure 2. Application of Figure 1 to BSs

6. Cybernetics and/of business systems One can, in terms of Figure 1, talk about several kinds of cybernetics. First, we can distinguish between early (old or traditional cybernetics – first-order cybernetics) and modern cybernetics (second- and third-order cybernetics and cybernetics of conceptual systems) (see Umpleby and Vale´e in Mulej et al., 2000). This classification includes dealing with BSs (Potocan, 2003; Potocan and Mulej, 2003): . Early (old) cybernetics used to employ mathematical models with feedbacks and regulation loops, including in consideration of an enterprise as a BS. The main contributions include the development of (fully) automatic production processes. On these terms, the ideal comprehension of an enterprise (not as a BS) is an automated system with no humans and no problems caused by them and management of them. That’s why the core of management is found in the technology and control systems able to do the job with a minimal human role. Rigidity of models and limitation of dealing with complexity to feedback loops was eventually found to be a serious deficiency. . The modern systems approach and cybernetics (dealing with the viewpoint of impact over entities of any kind) perceives all the complexities of an organisation as a cybernetic system and its close interdependence with its environment, be it a BS, a natural system, a technological system, a human/social system or a dialectical system including all of them for requisite holism. Four sets of ideas are specifically stressed when one tries to understand the role of organisations or humans as BSs and their characteristics (Potocan, 2003; Potocan and Mulej, 2004): (1) A BS is a communication network in which its components, individuals and groups mutually exchange information (which stresses the importance of informatics and of building the information systems as partial systems rather than systems or subsystems). (2) A BS is a system/network of activities in which the sources (matter, energy and information) are transformed into outcomes (which is the topic of the modern operations research, decision support systems, and expert systems), when talking about organisations as BSs and humans in the role of BSs (trying to get employed, retire, contract out one’s capabilities). (3) A BS is a societal system having certain societal tasks and responsibilities. As such, both a human and an organisation as a BS is a society’s subsystem (rather than a system of its own only) characterized by a network of roles and interactions, which are to be performed skilfully. Special attention is to be paid to ecological problems. (4) A human being, if considered and/or behaving as a BS, demonstrates more or less the same attributes as an organisation as a BS, although in a different way. Based on the summarized ideas about the cybernetic and systemic consideration we can say that the BSs share the following attributes of objects they describe as systems (Figure 3).

Business cybernetics

1505

K 34,9/10

1506

Figure 3. A selective multidimensional typology of objects described by systems (as mental pictures) from a number of viewpoints, typical of BSs

Early (old) cybernetics would see a process or a structure as systems/networks of black boxes with sets of feedback loops, hence as a basis for algorithms to be defined and to make room for automation of the entire business process, not only machines or machine networks. Modern cybernetics may see complexities, sources, outcomes, networks and synergistically emergent properties hardly permitting for any algorithms, except at a framework level. We can also see links to both early (old) and modern cybernetics. (1) Exposing, rather than only seeing or (even worse) leaving out, the complexities of reality helps that one neither oversimplifies nor over-complexifies one’s image of reality, and thus, one’s basis of both comprehension, decision and action. It is a clear experience that oversimplification of the image causes – by oversight – additional complication and complexity of consequences of the action. So does over-complexification, but by too much insight, preventing data from becoming information. (2) The business process of a BS can be dealt with through exposure of some parts in which algorithms can be clear enough for mathematics to be employed, even for automation to serve humans by support to routine and creative work. Sometimes, these algorithms remain on a framework level; sometimes integration of hard and soft systems methods can enter the combination (Kajzer and Potocan, 1997; Potocan, 2003; Potocan et al., 2003). There are also chances for incorporation of expert systems supportive of managerial decision making. The aim of modern cybernetics is to treat the majority of phenomena (i.e. natural, technical and social) and their systems. However, how (holistically) does cybernetics endeavour to study them?

The main goal of early cybernetics was to establish implicitly (subconsciously) and take into account specific origins of the characteristics of individual groups of phenomena (and the systems established to represent them). Its treatment was concentrated on the selection and use of an individual approach that enabled, to the greatest possible extent, a precise analysis of the selected phenomenon (or individual types of phenomena) from the chosen (dialectical) system of viewpoints, or at least from an individual viewpoint. Thus, three relatively separated treatments of natural, technical and social systems are typical of early cybernetics. For example, the researchers of social phenomena stemmed from the treatment of their social characteristics (i.e. the social viewpoint of these phenomena). Such an approach turned out to be relatively limited and frequently useless due to its one-sidedness. On the other hand, modern cybernetics strives to explore the phenomena (regardless of their characteristics and origin) relatively/requisitely holistically, based on (and considering) the treatment of the majority of their significant viewpoints. The viewpoints are only exceptionally linked. The majority of the known solutions for the selection and definition of the viewpoints of cybernetic treatment can be divided into two fundamental groups. (1) Partial/one-sided definitions of an individual viewpoint within the cybernetic treatment of the whole entity the part of which it comprises (the definition of an individual viewpoint as the sub-viewpoint and/or partial viewpoint of the holistic cybernetic treatment; subsystems or partial systems, either mutually divided or interlinked, yet partial findings). (2) Holistic (at the level of requisite holism) definition, which is a transition from an individual viewpoint of treatment to the dialectical system of viewpoints within the requisitely holistic cybernetic treatment (e.g. by DST). The dialectical system namely presents the selection of the network of viewpoints the authors consider essential and inter-dependent (Table I). In general, we can establish that each phenomenon should be examined cybernetically through the network of all the important viewpoints. Certainly, within the treatment, we should consider the specific starting points and characteristics of particular groups of phenomena (and the systems introduced to represent them). Therefore, different viewpoints, networks of viewpoints or dialectical systems/systems of treatment are highlighted for different phenomena. The differences in the selection of viewpoints and their interdependence depend on the subjective selections of the authors of definitions (this is also true for the engineering/scientific laboratory experiments). The subject of our examination, here, is a BS, which represents characteristics of societal systems, illustrates social processes and events from the viewpoint of (systemic) business viewpoints. See the important characteristics of three basic viewpoints of cybernetic treatment of phenomena selected on such basis in Table II (for details of each viewpoint of cybernetic treatment see Umpleby (1990) and Umpleby and Dent (1999)). . An engineering observation tries to study natural events from the viewpoint of their usability for tools. . A biological observation tries to study natural events from the viewpoint of their given attributes with no attempt to influence them.

Business cybernetics

1507

Table II. Three types of cybernetics

Crucial consequence

Key supposition

Natural processes can be explained by scientific theories and be used to make artefacts Scientific knowledge can be used to adapt natural processes to human benefits

Pragmatic: knowledge is constructed to be used for human purposes Biology of learning: observer as society’s participant

Social cybernetics

If people accept constructivism, they become more tolerant to differences in views and findings

Notions of knowledge may have their roots in neurophysiology

How people create, maintain, and change social systems using language and thinking Notions enjoy acceptance, if serving the aim of the observer as society’s participant If we change conceptual systems/subjective starting points, we can change society

Reality description: constructivism – picture according to the selected viewpoint/s Inclusion of the observer into the scientific Explanation of the relation between field sciences and humanities

Biology based: how does brain work

Biological cybernetics

Constructed theories, which explain the events under consideration and lead to artefacts Topic to be explained How does the world work and can hence How does a person construct his/her artefacts be made reality as his/her picture of reality

Puzzle to be solved

Description of reality: knowledge is a usable reflection of reality Reality: scientific theories – application according to the selected viewpoint/s

Engineering cybernetics

1508

Epistemological viewpoint Key difference

Differenciating viewpoint

K 34,9/10

.

A social observation tries to study social events from the viewpoint of the given attributes or from the viewpoint of influencing them.

Business cybernetics

We will present our view of dealing with the said dilemma on the case of cybernetics in business. 7. Business cybernetics as a case of interdependence between business practice, systems theories and cybernetics The dialectical classification into the interdependent general, special, and individual parts (subsystems) (Figure 1) means that any version of systems theory or cybernetics is useful and makes sense more or perhaps less than another one; the point is only in the difference of fields of applicability and usefulness (Figure 4). Individuals, organisations, and countries can also be seen as BSs. Doing business can namely be seen as a way toward viability in complex conditions (Figure 2) (Beer, 1959, 1972, 1975). Business is not the only way toward viability in complex conditions, but it differs, e.g. from medical care, healthy life style, innovations to be applied in production and elsewhere, leisure, sport, culture, etc. BC we do not mean market in other outer relations only, but also the internal ones, such as organising, management, work processes, structures. According to our suggestion, the difference of BC from the viable systems model in general lies therefore in the level of specific and individual details and depth of consideration of BSs. These details might be equally crucial as the general attributes. BC, hence, may be considered as a next step into research and application of the general cybernetics and of viable system model in the specific, rather narrow but important, area of business. Every topic under consideration can be seen on different levels of holism along the following lines (Table II). It takes into account the human capacity not to see objective reality, but its selected part/s of attributes only, and to do so, on three basic levels of the unavoidable simplification. Thus, the attained level of holism differs a lot. Years ago there were quite some discussion about holism (Ashby, 1956; Bertalanffy, 1968/1979; Agrell and Valle´e, 1985), but less so lately (Trappl, 2002; 2004). Concepts are rather different, ours is in Table I. Our warning resulting from Table I: very rarely the requisite holism can be attained without dialectical systems, requiring interdisciplinary creative co-operation. Example. In the case of a school of business, the object is the business life, the dialectical system is made of all courses offered, the system is the viewpoint of the business life presented in a single course, and it is modelled with the learning material, such as books and lectures. It is important to notice all communication takes

1509

Figure 4. Application of Figure 1 to cybernetics

K 34,9/10

1510

place with models, be it by a picture, book, mathematical formula, daily language phrase, body language, or movie, between human and other living beings, as well as non-living matter. Still, everybody tries to understand and master the object, the reality, which is much more entangled than models of it. However, the human capacity to decide is bigger than the capacity to understand reality holistically. We all reduce, unavoidably, and the point is – to what degree and with what level of comprehension do we reduce (as specialists, including the ones of systems theory and cybernetics). This applies to BC as well. Conclusion. One may perhaps come closest to a real usefulness, if all important versions of systems theories and cybernetics are networked, in terms of the principle represented by interdependence or, in the words of the authors of the “Critical Systems Thinking” by the system of systems methodologies. It pays tribute to interdependence by notions of the theoretical and methodological complementarity as well as by its applied methodology, called total systems intervention (Jackson, 1987, 1991). This applies to BC as well. By limiting the topic and adapting the principles of cybernetics, the term BC can be applied to the cybernetic treatment of the operation (and behaviour) of people and organisations as BSs. BC represents a special form of cybernetics, which can be (and should be) defined holistically based on the identification of its purpose, contents, methodology, and circumstances of use, needs and possibilities, as well as of the users. The latter can utilize it as the cybernetics of the zero-, first-, second- or third-order or as the cybernetics of conceptual systems – dependent on their selected system of viewpoints, preferably the dialectical system and requisite holism. BC is designated for the identification, definition, analysis of BSs and the influence on them. However, the question arises as to what business/BS is and how to define it. Business was first used in popular English literature sometime before 1010 (Ayto, 1993; Funk, 1992; Gove, 2002), at least. Up to now, the term has been used in various contexts and for different purposes and has denoted different meanings. The following generalizations can be made. By extension, the word business became (as recently as the eighteenth century) synonymous with “an individual commercial enterprise” and has taken on the meaning of “the nexus of commercial activities” or “the representatives of commercial activity”. Specifically, business can refer, collectively, to individual economic entities. In some legal jurisdiction, such entities are subject to law – for managers to conduct operations correctly on behalf of owners and entrepreneurs. A manufacturing business is commonly referred to as industry. Today, business represents one of the most widespread (and frequently used) terms of the modern world. Yet, the meaning of business is not necessarily the same. In modern economic literature, a number of different definitions of the term business may be found (See: www.pangaro.com/published/cyber-mcmillan.html; http://pespmc1. vub.ac.be). For example, Webster’s Dictionary (Gove, 2002) gives 17 different definitions for business. At least nine of them are related to economic treatment and/or the definition of economic viewpoints of cybernetics, including the following. (1) That which busies one, or that which engages the time, attention, or labour of any one, as his principal concern or interest, whether for a longer or shorter time; constant employment; regular occupation; as, the business of life; business before pleasure.

(2) Any particular occupation or employment engaged in for livelihood or gain, as agriculture, trade, art, or a profession. (3) Financial dealings; buying and selling; traffic in general; mercantile transactions. (4) That which one has to do or should do; special service, duty, or mission. For the purpose of BC determination, the definitions of business may be classified into two basic groups: first taking business as an activity (acting and behaving) and the second considering business as an interest. Therefore, BC may be understood and requisitely holistically defined on the basis of an adequate (synergetic) understanding and use of both content definitions mentioned above. A more detailed definition of the term BC depends on the selection and use of the methodology for its treatment (the approaches taken to the treatment, methods, methodologies). Why? Business is an elaborate (complex and complicated), dynamic and comprehensive phenomenon, which can, in our opinion, be adequately conceived and defined only in a requisite holistic systemic treatment. It makes sense to analyze it within this framework as a network of all selected significant viewpoints, levels and areas of activity. In the case of BC, we attempt to treat the activity and behaviour of a (specific) group (profit-oriented) of organisations (and/or people) from a network of all the selected viewpoints (organisational, management, economic, business) holistically, which enable the requisite holism of the treatment (considering its purpose and goals) of activity and behaviour. Based on the above mentioned (and presented) starting points, BC can be best defined in broadest terms as follows. BC (in our definition) (Kajzer and Kavkler, 1987; Potocan, 2003) is specializing in organisations and individuals as the so-called BSs emphasizing the so-called business viewpoints rather than the natural and/or technical/technological viewpoints of consideration of features, events and processes comprised of real life. Calling humans and organisations BSs rather than “features, events, and processes considered from the viewpoint/s of business sciences/practice”, may mean that the requisite holism of consideration and action is consciously or subconsciously limited to the selected viewpoint/s, and therefore, (rather) one-sided, although the expression “system” suggests holism. In this case, like in all other cases when one uses the word system, one should describe quite explicitly what viewpoint/s and content/s does one have in mind – in order to avoid mutual misunderstanding and the resulting lack of capacity and possibility to cooperate creatively on an interdisciplinary basis. When we speak about the role and importance of cybernetics (and/or BC) in business (Ashby, 1956; Beer, 1975; Clemson, 1984; Trappl, 1983), we must take into account that the idea of BC is close to uniting the cybernetics of the second- and third-order and cybernetics of conceptual systems into a dialectical system in order to provide for the requisite holism of management in the BSs. Thus, this concept adds a new (meaning of) relation of cybernetics of conceptual systems and of cybernetics of the observation, decision-making and impacting as the phases of the same process. In order to implement the mentioned cognitions about BSs we also need an appropriate methodological approach for understanding the BC aspect of considering

Business cybernetics

1511

K 34,9/10

1512

humans and their tools producing benefits for customers and themselves by three interdependent processes (basic, management, information). However, if we try to create a new requisitely holistic solution, we also need a new (requisitely holistic) approach in order to apply the exposed relation between BC and systems theory, the concept of interdependence. Of course, different relations cause interdependencies of different types and vice versa. The Ludwig von Bertalanffy’s concept of interdependence (Bertalanffy, 1968/1979) expresses the finding/reality that all parts of the universe, in one way or another, directly or less directly, influence each other. This is because they depend on each other due to their mutual differences; they are mutually complementary. Later this finding was expressed well and documented in Gaia, when it comes to nature in general. Today, it is expressed well in literature on systems, chaos and complexity theories (Bertalanffy and Rappoport, 1970; Checkland, 1981; Flood and Jackson, 1991; Flood, 1999; Mulej et al., 2000; Wood, 2000). Interdependence may not be seen only as a relation of elements inside a whole alone, but also as a relation between different viewpoints (which LvB does not discuss in Bertalanffy (1968/1979)). Interdependence of specialists is expressed and used by interdisciplinary cooperation. Thus, the relation between business, BC, cybernetics and systems theory consideration can be defined as one of the several parts of a whole/entity in a systemic consideration of the object at stake such as a BS. These parts exist and participate in different relations (internal, external), which make them create and realize a number of synergies. Hence, relation itself may be seen as a source of synergy/ies. These synergies may be used by cybernetics/cyberneticians aiming at a requisitely holistic application of the second- and/or third-order cybernetics and cybernetics of conceptual systems. In relation between BC and the viable system model it can be clear now that BC specialises in a way of attainment of viability, which is typical of humans and organisations as BSs. 8. Conclusions The business conditions (of globalization, no-mercy competition, permanent innovation, etc.) require humans, enterprises and other organisations, which we understand and define as BSs, to take a less narrowly specialised methodological approach and new methods in order to manage requisitely holistically rather than one-sidedly their holism, entanglement and dynamics. In our contribution, we discuss relations between two critically important viewpoints of research of BSs – systems theory and cybernetics. Both of them have dealt with complex rather than complicated entities/features/processes and they have both tried to stress relations between parts of reality, which have been considered in separation and hence one-sidedly rather than (requisitely) holistically, in other approaches. Different authors define systems theory differently, because they use different approaches based on their selected viewpoints. So do authors on cybernetics. If one takes the traditional definition that a system is an entity made of a set of interactive components, and if one takes the first-order cybernetics, interdependence between them is visible, but limited to the relations inside the entity/system under consideration/impact/viewpoints.

If one adds, in order to be less abstract and closer to reality, the consideration of the influential role of the selected viewpoint/s and of humans defining them, one comes closer to the array of the different less traditional systems theories, as well as to the less traditional concepts of cybernetics. In this case, interdependence is visible again, but it is extended to the relations between the object under consideration and the humans dealing with it. Our way of understanding where BC belongs is shown in Figure 5. Cognitions that a more holistic application of cybernetics and systems theory is possible might enable more of the necessary requisite holism in consideration of BSs. This need requires consideration of interdependence and synergetic working of, the business reality (i.e. working and behaviour of BSs); the systems thinking (i.e. the methodological approach enabling the requisite holism of understanding of the business practice); and cybernetics (i.e. methodology of impacting the business reality). This makes the general room for BC. BC can, on this basis, provide more depth and specification to the application of the viable systems model to BSs. At the same time, there is the need for a requisitely holistic consideration of BS as a unity/system made of the network/system of general, group-specific; and individual attributes. On these terms, one can formulate BC as a specification of the general and the group-specific attributes of cybernetics and management cybernetics, including the viable systems model, to business processes/BSs. In terms of contents, BC can be defined as a specialised cybernetics on the basis of its specific area; and specific methods of dealing with this area. This means application of the general cybernetics to BSs and adding to it the BSs-related specifics in order to requisitely holistically deal with business issues of organisations/humans from the selected dialectical systems of selected viewpoints. In terms of methodology, BC applies cybernetics to BSs, based on selected (dialectical systems of) viewpoints, purposes, goals, methods, methodologies, circumstances of use, and characteristics of its users. In terms of the level of closeness to reality, users can understand and apply cybernetics to their work as cybernetics of the first-, second-, and third-order or as cybernetics of conceptual systems, or as the dialectical system of them all.

Business cybernetics

1513

Figure 5. The area of BC among other sciences

K 34,9/10

1514

When we define the role and the significance of BC, we are facing new dilemmas, such as: how to create and implement the general methodology for requisitely holistic cognition, implementation, and management of working to typical BSs. What are the required and adequate conditions for the division of the entire cybernetics to the general and specialised ones, such as BC? What is the relationship (in terms of content and methods) between cybernetics as a whole, the general one and the specialised ones? What are the criteria for the definition of individual specialised cybernetics? What is the relationship between different narrow specialised areas of cybernetics within social, technical or natural sciences?, etc. These are issues for further research. References Agrell, P. and Valle´e, R. (1985), “Different concepts of system analysis”, Kybernetes, Vol. 14 No. 2, pp. 81-5. Ashby, W. (1956), An Introduction to Cybernetics, Chapman and Hall, London. Ayto, J. (Ed.) (1993), Dictionary of Word Origins, Arcade Publishing, New York, NY. Bausch, K. (2003), The Emerging Consensus in Social Systems Theory, Kluwer, New York, NY. Beer, S. (1959), Cybernetics and Management, English University Press, London. Beer, S. (1972), Brain of Firm, Allen Lane, London. Beer, S. (1975), Platform for Change, Wiley, London. Bertalanffy, L. (1979), General Systems Theory, Foundations, Development, Applications, Brazillier, New York, NY, revised ed. (originally published 1968). Bertalanffy, L. and Rappoport, A. (1956-1972), General Systems, Yearbooks of the SGSR. Checkland, P. (1981), Systems Thinking, Systems Practice, Wiley, Chichester. Christakis, A. and Bausch, K. (2003), “Agoras of the global village”, in Christakis, A. and Bausch, K. (Eds), Proceedings of 47th ISSS Conference, ISSS, Iraklion, pp. 1-24. Clemson, B. (1984), Cybernetics: A New Management Tool, Abacus Press, Tunbridge Wells. Davidson, M. (1983), The Life and Thought of Ludwig von Bertalanffy, Teacher, Los Angeles, CA. Delgado, R. and Banathy, B. (Eds) (1993), International Systems Science Handbook, Systemic Publications, Madrid. Ecimovic, T., Mulej, M. and Mayur, R. (2002), Systems Thinking and Climate Change System, SEM, Korte. Elohim, J. (2001), Unity through Diversity, Technical University of Vienna, Vienna. Eriksson, M. (2003), “Identification of normative sources for systems thinking”, Systems Research and Behavioral Science, Vol. 20 No. 6, pp. 475-88. Espejo, R. and Harnden, R. (Eds) (1989), The Viable System Model: Interpretations and Applications of Stafford Beer’s VSM, Wiley, Chichester. Flood, R. and Jackson, M. (Eds) (1991), Critical Systems Thinking, Wiley, Chichester. Flood, R. (1999), Rethinking the Fifth Discipline, Routledge, London. Franc¸ois, C. (1999), “Systemic and cybernetics in a history perspectives”, Systems Research and Behavioral Science, Vol. 16 No. 5, pp. 203-19. Franc¸ois, C. (Ed.) (2004), International Encyclopedia of Systems and Cybernetics, K.G. Saur, Munchen. Foerster, H. (1974), Cybernetics of Cybernetics, Urbana, IL. Foerster, H. (1981), Observing Systems Intersystems, Seaside, CA.

Foerster, H. (1987), Cybernetics (In Encyclopedia for Artificial Intelligence), Wiley, New York, NY. Funk, W. (Ed.) (1992), Word Origins: An Exploration and History of Words and Language, Random House, New York, NY. Gove, P. (Ed.) (2002), Webster’s Third New International Dictionary, Merriam, New York, NY. Jackson, M. (1987), Systems Thinking: Creative Holism for Managers, Wiley, Chichester. Jackson, M. (1991), Systems Methodology for the Management Sciences, Plenum Press, New York, NY. Kajzer, S. and Kavkler, I. (1987), Business Cybernetics, VEKS, Maribor (in Slovene). Kajzer, S. and Mulej, M. (1996), “A basic tool of cybernetics of the socioeconomic systems”, Cybernetics and Systems, Vol. 27 No. 1, pp. 555-63. Kajzer, S. and Potocan, V. (1997), “Synergy and integration processes in business”, Management, Vol. 2 No. 2, pp. 1-12. McCulloch, W. and Pitts, W. (1943), “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics, Vol. 5, pp. 115-37. Maturana, H. (1980), Autopoiesis and Cognition: The Realization of the Living, Ridel, Boston. Miller, J. (1978), Living Systems, McGraw-Hill, New York, NY. Mueller-Merbach, H. (1992), “Vier Arten von systemansaetzen, dargestellt in lehrgespraechen”, ZfB, Vol. 62, pp. 853-76. Mulej, M. (1974), Dialectical Systems Theory, FEB, Maribor (in Slovene). Mulej, M. and Kajzer, S. (1998), “Ethics of interdependence and the law of requisite holism”, in Mulej, M. (Ed.), Proceedings of STIQE, ISRM, FEB Maribor, Maribor, pp. 129-40. Mulej, M. (2000), Basics of Systems Thinking, Faculty of Economics and Business, Maribor. Mulej, M., Kajzer, S., Mlakar, P., Mulej, N., Potocan, V., Rebernik, M. and Ursic, D. (2000), The Dialectical and Other Soft Systems Theories, FEB, Maribor (in Slovene). Mulej, M., Potocan, V., Zenko, Z., Kajzer, S., Ursic, D. and Knez-Riedl, J., et al. (2004), “How to restore bertalanffian systems thinking”, Kybernetes, Vol. 33 No. 1, pp. 48-61. Plato (Ed.) (1955), The Republic, Penguin Books, New York, NY. Plato (Ed.) (1971), Gorgias, Penguin Books, New York, NY. Potocan, V. (2003), Business Organisation, Doba, Maribor (in Slovene). Potocan, V. and Mulej, M. (2003), “On requisitely holistic understanding of sustainable development”, Systemic Practice and Action Research, Vol. 16 No. 6, pp. 421-36. Potocan, V., Mulej, M. and Kajzer, S. (2003), “Duality, cybernetics and system(s) theory from the aspect of business cybernetics?”, in Christakis, A. and Bausch, K. (Eds), Proceedings of 47th ISSS Conference, ISSS, Iraklion, pp. 69-81. Potocan, V. and Mulej, M. (2004), “The requisite holism of information in a virtual business organisation’s management”, The Journal of American Academy of Business, Vol. 5 Nos 1/2, pp. 411-7. Rebernik, M. and Mulej, M. (2000), “Requisite holism, isolating mechanisms and entrepreneurship”, Kybernetes, Vol. 29 Nos 9/10, pp. 1126-40. Rosenblueth, A., Wiener, N. and Bigelow, J. (1943), “Behavior, purpose and teleology”, Phipsophy Science, Vol. 10, pp. 18-24. Schiemenz, B. (1972), Regelungstheorie und Entscheidungsprozesse. Ein Beitrag zur Betriebskybernetik, Gabler, Wiesbaden. Trappl, R. (Ed.) (1983), Cybernetics: Theory and Applications, Hemisphere, Washington, DC.

Business cybernetics

1515

K 34,9/10

1516

Trappl, R. (Ed.) (2002), “Cybernetics and systems 2002”, Proceedings of the Sixteenth European Meeting on Cybernetics and Systems Research, Austrian Society for Cybernetic Studies, Vienna, University of Vienna, Austria, 2-5 April. Trappl, R. (Ed.) (2004), “Cybernetics and systems 2004”, Proceedings of the Seventeenth European Meeting on Cybernetics and Systems Research, Austrian Society for Cybernetic Studies, Vienna, University of Vienna, Austria, 13-16 April. Umpleby, S. (1990), “The science of cybernetics and the cybernetics of science”, Cybernetics and Systems, Vol. 21 No. 1, pp. 109-21. Umpleby, S. (2001), “What comes after second-order cybernetics?”, Cybernetics and Human Knowing, Vol. 8 No. 3, pp. 87-9. Umpleby, S. and Dent, E. (1999), “The origins and purposes of several traditions in systems theory and cybernetics”, Cybernetics and Systems, Vol. 30 No. 2, pp. 79-103. Valle´e, R. (2001), “Time and dynamical systems”, in Bubnicki, Z. (Ed.), Proceedings of 14th International Conference on Systems Science, Wroclaw University of Technology, Wroclaw, pp. 112-26. Valle´e, R. (2003), History of Cybernetics (in EOLSS Encyclopedia of Life Support Systems), EOLSS, available at: www.eolss.net Warfield, J. (2003), “A proposal for systems science”, Systems Research and Behavioral Science, Vol. 20 No. 6, pp. 507-20. Wiener, N. (1948), Cybernetics or Control and Communication in the Animal and the Machine, Wiley, New York, NY. Wiener, N. (1956), Human Use of Human Beings: Cybernetics and Society, Doubleday Anchor, New York, NY. Wiener, N. and Masani, P. (1976), Collected Works – Vol. 1, MIT Press, Cambridge, MA, (Works of Norbert Wiener collected by Pesa Masani). Wood, R. (2000), Managing Complexity: How Businesses Can Adapt and Prosper in the Connected Economy, The Economist Books, London. Zadeh, L. (1965), “Fuzzy sets”, Information and Control, Vol. 8, pp. 338-58. Zadeh, L. and Kacprzyk, J. (Eds) (1992), Fuzzy Logic for the Management of Uncertainty, Wiley, New York, NY.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Dynamic portfolio management under competing representations

Dynamic portfolio management

Ralf O¨stermark

˚ bo Akademi University, Department of Business Administration, A Henriksgatan, Finland

1517 Revised January 2005

Abstract Purpose – To solve the multi-period portfolio management problem under transactions costs. Design/methodology/approach – We apply a recently designed super genetic hybrid algorithm (SuperGHA) – an integrated optimisation system for simultaneous parametric search and non-linear optimisation – to a recursive portfolio management decision support system (SHAREX). The parametric search machine is implemented as a genetic superstructure, producing tentative parameter vectors that control the ultimate optimisation process. Findings – SHAREX seems to outperform the buy and hold-strategy on the Finnish stock market. The potential of a technical portfolio system is best exploitable under favorable market conditions. Originality/value – A number of robust engines for matrix algebra, mathematical programming and numerical calculus have been integrated with SuperGHA. The engines expand its scope as a general-purpose algorithm for mathematical programming. Keywords Cybernetics, Portfolio investment, Optimization techniques, Programming and algorithm theory Paper type Research paper

1. Introduction Dynamic portfolio management is a key topic in modern portfolio theory, involving several fields of science, such as computer science, finance, mathematical programming and statistics. The standard Markowitz (1952) mean-variance approach states the portfolio selection problem as a quadratic programming problem where either the total variance is minimized at a given level of return or portfolio return is maximized subject to a given level of portfolio risk (variance). The Markowitz formulation can be used to trace the efficient frontier in risk-return space. The formulation is intimately connected to the assumption of multivariate normality. Furthermore, portfolio selection is treated as a time-invariant problem. The famous efficient market hypothesis (Alexander, 1964; Fama and Blume, 1966) precludes excess profits obtained by technical analysis of financial asset pricing. Empirical evidence from the period 1960-1980 – especially on the US market – supported this hypothesis, yet later studies have reached contradictory conclusions (e.g. Neeley and Weller, 1999; Marney et al., 2001). Alternatives to the Markowitz mean-variance approach have focused on, e.g. possibilities to incorporate other risk measures than variance into the formulation (Feiring et al., 1994), gaining computational savings through linear representations of risk (for example, the mean absolute deviation (MAD) formulation discussed in Konno and Yamazaki (1991) and Simaan (1997)), recognizing asymmetric returns (Konno et al. (1993), transactions costs (Adcock and Meade, 1994; Mulvey and Vladimirou, 1992; Yoshimoto, 1996) etc. A good overview of some representative approaches is given in Chang et al. (2000).

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1517-1550 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614795

K 34,9/10

1518

In this study we apply the SuperGHA algorithm presented and extended in ¨ stermark (1999a-c, 2000a, b) to a dynamic portfolio management problem. In the O optimisation stage we invoke the FSQP algorithm (Lawrence et al., 1997) from within the accelerator-facility of genetic hybrid algorithm (GHA) (O¨stermark, 2002b). The basic recursive portfolio management problem was formulated in O¨stermark (1991) and extended in O¨stermark and Aaltonen (1992). The parallel framework of the GHA (O¨stermark, 1999a) was generalized in O¨stermark (2002) to superstructures performing explicit search for the optimal parametric settings for the mathematical programming problem. The family of parameter vectors evolves through ordinary genetic operators aimed at producing the best possible parameterisation for the underlying optimisation problem. In comparison to traditional genetic algorithms, the integrated superstructure involves a twofold ordering of the population. The first sorting key is provided by the objective function of the optimisation problem. The second key is given by the total mesh time absorbed by the parametric setting. In consequence, SuperGHA is geared at solving an optimisation problem, using the best feasible parameterisation in terms of both optimality and time absorbance. The algorithm combines features from classical non-linear optimisation methodology and evolutionary computation utilizing a powerful accelerator technique (O¨stermark and Saarinen, 1996; Glover, 1977, 1997). The constrained problem can be cast into multiple representations, supporting the integration of different mathematical programming environments. Extensive Monte Carlo simulations demonstrate the efficiency of SuperGHA (O¨stermark, 2002). SuperGHA may be invoked as a co-operative co-evolutionary algorithm for decomposable problems, where clusters of parallel processors solve different subtasks (Potter and De Jong, 1995). Collaboration can be constructed as follows: the current best individual is transmitted from each population to the root. The globally best individual is distributed to the mesh for further processing. A superstructure utilizing artificial intelligence is critical for solving difficult real-world optimisation problems (O¨stermark ¨ stermark, 1999a-c, 2000a; Pettersson, 1994; Westerlund et al., 1994, 1998). et al., 2000; O GHA has been integrated with a number of powerful support libraries in different tests, including the linear algebra package (LAPACK), the mixed integer linear programming (MILP) solver LP_SOLVE, the interactive mathematical and statistical library (IMSL, 1987), the feasible sequential quadratic programming (FSQP) algorithm (Lawrence et al., 1997) and the optimisation package of Quandt (cf. Goldfeld and Quandt, 1972). The libraries are attached to GHA depending on the needs of the particular project. The computational support engines of SuperGHA are presented in Figure 1. SuperGHA can be connected to a multitude of computational systems in the fields of artificial intelligence, mathematical programming and statistics. Dynamic penalty functions represent a particularly promising approach to constrained optimisation. Following Joines and Houck (1994) we assume dynamic penalties in the evaluation function of the optimisation problem (O¨stermark, 1999a). The basic set of functional forms for the penalty functions is presented in Figure 2. As shown in O¨stermark (1999a), different representations of the same problem are allowed simultaneously. This is an important feature especially in parallel computers, where particular machines can be dedicated to distinct representations and corresponding solution procedures. This system development is in line with the obvious need for further research efforts in internal GA-representations suggested by, e.g. Holland (1975) and Manderick et al. (1991). The algorithm entertains a super layer

Dynamic portfolio management 1519

Figure 1. The computational support engines of GHA

that exploits a parametric space, searching for the best parametric setting for GHA in terms of adequacy and time absorbance (O¨stermark, 1999a, 2000a, b). In O¨stermark (2002), the parallel GHA-framework is generalized to a two-dimensional cluster structure for super layer computation. The array of processors is divided into clusters, each having a managing processor (cluster head). One of the clusters is designated as the root cluster. The head of this cluster (the root processor) controls the genetic manipulation of the parameter vectors and the distribution of the parameters over the

K 34,9/10

1520

Figure 2. Functional forms for sequential SuperGHA penalties

mesh. The cluster heads communicate the task settings and the optimal solution to/from the root processor (cf. Figure 3). The total communication overhead represents ¨ stermark, 1999a). It should be an insignificant portion of total system activity (cf. O noted that each processor, including the root, solves a local optimisation problem as declared and possibly modified during the problem-solving process in its task list. 2. The alternative prediction models 2.1 The individual representations A variety of non-linear time series models have been tested using the SuperGHA ¨ stermark (2002), the back propagation neural network algorithm. For example, in O algorithm was integrated with the genetic search machinery to produce solutions to a difficult staircase problem, not directly solvable by a pure neural network. In this paper we consider the following models as predictors for SHAREX:

Dynamic portfolio management 1521

Figure 3. Cluster layout for SuperGHA. The nodes are denoted by nij, i ¼ node id, j ¼ cluster number

(1) The basic autoregressive conditional heteroskedastic (ARCH) model of Engle (1982). (2) An adaptive non-linear perturbation of the mean equation in ARCH. (3) The K-Nearest-Neighbour (KNN)-model (Farmer and Sidorowich, 1987) (4) A heuristic market indicator regression model. The portfolio performance of the forecasting methods serves as their ultimate success criterion. We will compare both their individual performance and the performance of their application in a multiple representation framework. The economic performance of the forecasting methods is tested in the following settings: (1) In single representations, where each forecasting method is used as the sole predictor for each series. (2) In multiple representations, where the best forecast is selected for each data series at each time point of the recursive test interval. (3) In linear and fuzzy non-linear combination of the multiple predictor representations and selection of the best forecast at each time point among the individual representations and their combination. Engle (1982) introduced the Autoregressive Conditional Heteroskedasticity (ARCH) model that soon gained wide popularity in theoretical and empirical time series research. A direct extension of the ARCH-model is the Generalized ARCH (GARCH) model of Bollerslev (1986). A particularly important development has been the class of ARCH-models that can capture the correlation between the endogenous variable and conditional variance, i.e. the asymmetric volatility models (cf., e.g. Schwert, 1989, Pagan and Schwert, 1990). This class of models has proved to be valuable in studying the dynamics of conditional variance in financial time series. The Exponential Generalized ARCH (EGARCH) model of Nelson (1991) is one of the earliest asymmetric models in this field (cf. O¨stermark and Ho¨glund, 1997, for an application in the multivariate setting). However, the EGARCH-model is difficult to estimate and has

K 34,9/10

1522

been outperformed by other representative models of the asymmetric class in several empirical tests. Engle and Ng (1993) proposed new diagnostic tests for adequacy of the chosen asymmetric volatility model and recommended the concept of news impact curve as a measure of asymmetry. The authors studied the following (asymmetric) ARCH-models in Monte Carlo runs and empirical tests with Japanese stock returns: The non-linear ARCH model (Engle and Bollerslev, 1986), the multiplicative ARCH (Milhøj, 1987; Geweke, 1986; Pantula, 1991), the GJR model (Glosten et al., 1989), EGARCH (Nelson, 1991), the autoregressive standard deviation model (Schwert, 1989), the (non-linear) asymmetric GARCH model, and the VGARCH model. The results indicated that, for extreme shocks, the standard deviation of the EGARCH estimated conditional variance is higher than that of the squared residual itself. Of the variance parametric models, the GJR was the best at parsimoniously capturing asymmetric volatility. Hagerud (1997) investigated the asymmetric volatility in a set of Nordic stock returns, using several parametric GARCH-models. The author concluded that the GJR, the Generalized Quadratic GARCH (GQARCH) (Sentana, 1995) and the Threshold GARCH (TGARCH) model of Zakoı¨an (1994) performed at least as well as the more complicated models. In several of the more complicated models (e.g. the logistic smooth transition GARCH(1,1) model of Hagerud (1997), and Gonza´lez-Rivera, (1996)), either a negative conditional variance or non-convergence was reported. In the ARCH-representation we will assume that the mean equation follows an AR( p)-process yt ¼ f0 þ

p X

fi yt2i þ 1t

ð2:1Þ

i¼1

where 1t denotes a discrete time stochastic process of the form 1t ¼ ht h 1=2 ; ht ~ nidð0; 1Þ The path breaking model of Engle (1982) is a qth order autoregressive conditional heteroskedasticity model, the ARCH(q): ht ¼ a 0 þ

q X

ai 12t2i

ð2:2Þ

i¼1

where {ai ; i ¼ 0; 1; . . . ; q} are constant parameters. Normally, a distributed lag effect of the type ai , aj for i . j is expected, i.e. older innovations are less significant than newer with respect to current volatility. Bollerslev (1986) generalized the ARCH(q) model to a GARCH(q, r). Some alternative [X]ARCH representations have been extensively tested in ¨ stermark (1999d). In the present study, analysis is focused on the original O ARCH-model on the one hand and ARCH with a non-linear perturbation of the mean equation on the other. The non-linear modification was inspired by the Adaptive Filtering (AF) scheme (Gustafsson, 2000). In comparison, Kalman Filtering requires knowledge of the system matrices not usually available when forecasting stock returns ¨ stermark, 1991). The simplicity and flexibility of on a daily basis (Kodogiannis, 2000; O

the AF-procedure makes it an attractive candidate for non-linear perturbation of linear predictions. The KNN-method is characterized in the literature as a non-linear deterministic approach to modelling a time series as a trajectory of a discrete dynamic system. It has been frequently used in forecasting chaotic maps. The forecast of yt is calculated by ordinary least squares (OLS), where the regressors are vectors of lagged values of the endogenous variable yt2 d, 0 , d , t 2 k; and k is the specified lag order. At each time point t, K lagged regressors ðk , K , TÞ are selected based on the Euclidean distance between the first lagged vector y t21 ¼ ½yt21 ; yt22 ; . . . ; yt2k 0 (automatically included in the set of regressors) and the other lagged vectors in the data, y t2d ¼ ½yt2d ; yt2d21 ; . . . ; yt2d2kþ1 0 ; t ¼ k; . . . ; T: The KNN-model at time T is used to generate the out-of-sample prediction over the planning horizon H. As the last model we investigate a heuristic TREND-model, where the individual stock index series are connected to the stock market index using ordinary least squares, after suitable transformations. The TREND-model consists of two main steps. Firstly, the market index is estimated by an autoregressive model. Secondly, each stock series (or its transformation) is regressed on the market index. The regression coefficients are used to produce an H-step prediction for the stock price from that of the market. We note that, instead of predicting levels or returns, there is some recent evidence in support of predicting the direction of price movements (Leung et al., 2000). However, for the time being there is no evidence related to detailed mathematical programming solutions derived from such a directional information. The KNN- and TREND-models are tested in two configurations. In the first, the pure models are estimated as such. In the second, the m . 0 most important frequencies of a Fourier decomposition of the endogenous vector are added to the set of regressors (cf. Bloomfield, 1976). For the KNN-method this means a separate Fourier analysis for each time point in the estimation interval. The prediction models {ARCH, AFARCH (Adaptive Filtering perturbed ARCH) KNN and TREND (the last two with/without the Fourier frequencies)}, are combined using both simple averaging (SA) and the fuzzy Gaussian combination technique (FGC, Fiordaliso, 1998). At each time point, the system is pruned to select the best available forecast {ARCH, AFARCH, KNN/Fourier, TREND/Fourier, SA/FGC}, based on the log likelihood function of the prediction model. The underlying stochastic processes are assumed to be white noise. We note that, the normality assumption is usually made in applied research. However, some recent studies (Booth and Koutmos, 1998) imply that other distributions like the Generalized Error Distribution (GED) might have potential in modeling stock prices/returns. Currently, no efficient procedure exists for detecting the best distributional form for a time series ex ante. The alternative would be to estimate the statistical models under competing distributional forms and likelihood functions. Since different likelihood functions are generally non-comparable, however, the best models would have to be selected using some suitable external criteria. This is a genuine multiple criteria optimisation problem (cf. O¨stermark and Ho¨glund, 1991). This important topic – especially calling for extensive empirical studies with both simulated and observed data – is left for future research.

Dynamic portfolio management 1523

K 34,9/10

1524

2.2 Forecast combination We will subsequently apply two particular forecast combination techniques, representing two extremes of complexity: (1) A simple average of the individual representations, where each method has an equal weight in the set of forecasts. (2) A fuzzy non-linear (Gaussian) weighting of the individual forecasts. The weigths of the fuzzy non-linear membership functions are determined using geno-mathematical techniques – genetic search procedures combined with classical optimisation techniques. For each stock series and at each forecast interval, the best predictor is selected over the multiple representations and their combination. The idea of a fuzzy non-linear forecast combination – considered as a function approximation problem – was introduced by Fiordaliso (1998). Fiordaliso applies the Takagi and Sugeno (1985) first order fuzzy inference system (if y is A, then z ¼ f ð yÞ). The fuzzy combination operator is derived as follows. Let p

¼ number of individual representations (i.e. forecast models)

r

¼ number of fuzzy membership functions (i.e. rules in the Takagi-Sugeno fuzzy system)

y^ jt

¼ the forecast of a time series yt produced by representation j at time t, j ¼ 1; . . . ; p

¼ linear combination weight of rule k ¼ 1; 2; . . . ; r for representation j ¼ 1; 2; . . . ; p (Fiordaliso (1998) adds a linear weight representing the intercept term, which is discarded here).   mAk ðxÞ ¼ G kx 2 mk k2Sk ¼ the kth p-dimensional generalized Gaussian membership function for coding of the individual forecast representations. GðxÞ ¼ expð2xÞ and k · k is the weighted norm of x, defined by kxk2sk ¼ x0 S Tk S k x; where Sk is a square matrix. bjk

When Sk is the identity matrix, the norm reduces to the Euclidian, i.e. the set of points having the same norm are located at the same distance (radius) from the center mk. When Sk is diagonal, the set of points having the same norm form an ellipsoid. By allowing Sk to have non-diagonal entries, the resulting Gaussian can rotate around its center (Fiordaliso, 1998). Since, in contrast to Fiordaliso, we will carry out extensive recursive modelling with a large number of stock series, we decided to allow only diagonal entries in Sk.. For the same reason, the number of potential membership functions has been constrained to no more than 3. At each time point, the individual linear/non-linear time series models and the fuzzy combination model are estimated for each time series, using genetic programming and classical optimisation methodology jointly. The obtained best forecasts for each stock will enter the mathematical programming problem to obtain a multi-period portfolio strategy for the current planning horizon.

The linear combination of the p individual forecasts is written as: Bk ð y^ t Þ ¼ b1k y^ 1t þ b2k y^ 2t þ · · · þ bpk y^ pt : The fuzzy non-linear forecast combination Fðy^ t Þ can then be written as follows: r X

Fð y^ t Þ ¼

r X

mAk ðy^ t ÞBk ðy^ t Þ

k¼1 r X

¼ mAk ðy^ t Þ

k¼1

k¼1 r X

mAk ðy^ t Þ

1525 ðb1k y^ 1t þ b2k y^ 2t þ · · · þ bpk y^ pt Þ

mAk ðy^ t Þ

k¼1

The Gaussian combiner has some desirable properties, such as nonzero values guaranteed in the denominator for any input (Fiordaliso, 1998). The classical Takagi-Sugeno model is supplemented by a real-valued amplitude function controlling the intensity (importance) of each rule. For this, we use the non-decreasing function

hðrk Þ ¼ ð1 2 expð2rÞÞ [ ½0; 1; k ¼ 1; 2; . . . ; r which is slightly modified from the one used by Fiordaliso (1998). The intensity function assists in pruning the set of Gaussian membership functions, a low intensity implying that the corresponding rule can be deleted without too much impact on system output. The intensity controlled fuzzy combiner then becomes: r X

Fð y^ t Þ ¼

hðrk ÞmAk ð y^ t ÞBk ð y^ t Þ

k¼1 r X

Dynamic portfolio management

hðrk ÞmAk ð y^ t Þ

k¼1

Fiordaliso estimated the membership functions in a two-step procedure, aiming at eliminating redundant Gaussians. In the present study, we use geno-mathematical programming technology to solve the joint problem of minimal representation and parameter estimation. This is one of the first attempts to simultaneously solve the forecast combination problem and its parameterisation. The features of the Takagi-Sugeno approach are illustrated in Figures 4-7 for two competing models, ARCH and AFARCH. Figures 4-6 represent the solution to the unreduced fuzzy non-linear combination problem with three Gaussians for the Finnish firm Tietoenator Ltd, iteration 18 out of 435 (planning horizon 02/24/1999-02/29/1999, including two weekend days). Figure 7 shows the reduced solution to the same combination problem, where the elimination of superfluous membership functions has been achieved through binary (0/1) coding of the parameter vectors for the Gaussians. In other words, genetic operators are used on a binary string of same length as the parameter vector to form reduced representations. Position i of the binary string corresponds to weight number i in the parameter vector. The reduced parameter vector is subjected to ordinary genetic and classical mathematical programming operations to maximise the predictability of the system. By genetic search we will eventually converge to an acceptable set of fuzzy membership functions. The solutions to the unreduced sample problem (Figures 4-6) illustrate that the membership functions are

K 34,9/10

1526

Figure 4. First Gaussian of full sample solution

Figure 5. Second Gaussian of full sample solution

tuned by the system to account for distinct properties of the combination problem. The first Gaussian is fairly neutral with respect to both ARCH- and AFARCH-predictions, the second Gaussian tends to give higher membership values for ARCH than AFARCH. The third Gaussian, finally, isolates a combination property where the AFARCH-prediction can be affected (improved) by the ARCH-prediction, but not vice versa. This last property is the most important when considering system complexity (Figure 7). 3. The recursive portfolio management problem The recursive portfolio management system was introduced in O¨stermark (1991). The main idea is to generate out-of-sample price forecasts over the planning horizon for

Dynamic portfolio management 1527

Figure 6. Third Gaussian of full sample solution

Figure 7. Gaussian of reduced sample solution

the stocks in the reference portfolio and then to use these predictions as parameters in a mathematical programming problem. The approach is recursive in that statistical forecasts and subsequent portfolio optimisation are repeated at each time point of the study interval, using the previous day position as the initial inventory (of cash, debt and individual stock holdings) for the current day. The planning horizon represents a moving target, where the transactions for the current day only will be implemented at each time point. The mathematical programming formulation recognizes the transaction costs. The predicted prices are scaled by the batch size of each

K 34,9/10

1528

individual stock. The transactions are represented in integer batches, to increase tradability. The tests are conducted using daily stock price series. In O¨stermark and Aaltonen (1992), the portfolio efficiency of the recursive portfolio management system was presented in extensive tests on Swedish and Finnish daily stock data. The forecasts were generated using the vector-valued state space algorithm of Aoki (1988). In the present study, univariate non-linear (heteroskedastic) models are used as the main predictor engine. An investigation of non-linear heteroskedastic vector-models is left for future research. The system has been extended to allow alternative risk formulations and changing the risk formulation during system evolution. For example, given that the investor has achieved a target level of returns and nominal asset value, it may be that he wants to switch to a more cautious strategy, for example by increasing the liquidity of the portfolio. The portfolio management problem at time point t is to maximize the objective function subject to four sets of constraints: liquidity, inventory, risk and minimum/fixed transactions costs. The stocks are assigned to pre-specified risk categories, based on an initial sorting of the stocks in increasing level of risk. The risk is measured by the stock’s return beta, return variance or the ratio of the return standard deviation over its mean. These measures are frequently used as competing risk measures (cf., e.g. Fama and French, 1998; Hsia et al., 2000, for conditions for using beta). By decomposing the stocks into risk groups, we can effectively control both the level of risk and the diversity in the portfolio over time. The fixed transactions costs can be recognized in either a bilinear or a mixed integer linear form. Let t,f

¼ variable and fixed transactions cost rates

I

¼ interest rate

pit

¼ the stock price of asset i at time point t

T

¼ total number of observations in the current data set (the data set is moving forward by one day in each run

H,N

¼ planning horizon and size of reference portfolio respectively

xt ; yt [ RN

¼ sales and purchase volumes at time t (discrete batch sizes).

zt [ R N

¼ binary-valued transactions cost switches

dit

¼ slack variable for diversity deviation in risk group i at time t. Active only in those risk formulations where diversity control is imposed.

dm it

¼ slack variable for minimum transactions costs pertaining to asset i at time t.

e

¼ unit vector of conformable size

Dt

¼ amount of debt at time t

DDt

¼ amount of excess debt at time t

p i

¼ average price of asset i, computed over the planning horizon.

l

¼ target risk of the investor (in units of stock return beta, variance or the ratio between standard deviation and mean of the stock price)

G

¼ number of risk categories. In order to guarantee diversification, the stocks are rank-ordered by the level of risk and assigned to one of G risk groups in the mathematical programming formulation.

Ct

¼ end of period cash at time t

It

¼ end of period value of risky assets at time t, evaluated at the predicted prices.

wt ; wD ; wDDt ; wC ; wI ¼ penalty/reward coefficients for transactions, debt, excess debt, liquidity and risky assets. Note that the penalty on transactions – both sales and purchases – is non-positive. The portfolio optimisation problem (POP) is stated below. POP: maximize f¼

TþH X t¼T

" N  X i¼1

#    p d it m T 2 wD Dt 2 wDD DDt 2 fzt þ d t e 2wt tpit ðxit þ yit Þ 2 lG

þ wC C TþH þ wI I TþH s:t: AT wT # bT

ðliquidity; risk; inventory positions; minimum transactions costsÞ

ð3:1Þ where AT, bT are adjusted by the prices and asset postions, respectively, over the time interval ½T; T þ H   0  0 T T wT ¼ y T x T d T D DD z T d m y T x T d T D DD z T d m T Tþ1; ... ;  0   T $0  y T x T d T D DD z T d m TþH

The fixed transactions costs are recognized either by a single bilinear non-convex restriction or by a set of N ðH þ 1Þ linear constraints with N ðH þ 1Þ additional integer (0/1) variables. The bilinear constraint is written as: H X N X t¼0 i¼1

ðxit þ yit Þð1 2 zit Þ ¼ 0 ðfixed transactions costsÞ

ð3:2aÞ

Dynamic portfolio management 1529

K 34,9/10

Alternatively, the fixed transactions costs can be controlled by linear constraints as follows: ðxit þ yit Þ # zit ; i ¼ 1; 2; . . . ; N t ¼ 0; 1; . . . ; H ðfixed transactions costsÞ L

1530

ð3:2bÞ

where L is a large number. POP is solved with a H-period horizon, each time using the latest predictions of asset prices and the current holdings of risky assets and cash (O¨stermark, 1991). The decision variables can be treated as continuous, with ex post rounding to conform to the discrete batch size requirement. Alternatively, all variables except for dt can be treated as discrete, with the effect of having a large MILP to solve even for moderately sized reference portfolios. The size of POP (2.1) depends on H, N and the method applied to quantify risk. The size of the optimisation problem increases rapidly with these parameters. For example, with H ¼ 2; N ¼ 36; no fixed transactions costs and the basic risk formulation with G ¼ 3 risk groups (3.6 below), the MILP would have m ¼ ðN þ G þ 3Þ £ ðH þ 1Þ ¼ 126 constraints and n ¼ ð3 £ N þ 4Þ £ ðH þ 1Þ ¼ 336 variables, of which 2 £ ðH þ 1Þ £ N ¼ 216 integers (purchases and sales). When introducing fixed transactions costs through (3.2b), the problem size would increase by ðH þ 1Þ £ N ¼ 108 constraints and integer variables up to m ¼ 234; n ¼ 444: The portfolio problem will quickly turn insurmountable as a genuine MILP even with moderately sized H and N. However, fairly large problems can be tackled by a new powerful and flexible branch and bound algorithm (MILP-machine) developed for GHA by the author. The performance of the MILP-machine for GHA is shown in Table I and Figure 8-10 for different sized portfolio problems. The branch and bound algorithm makes full use of dynamic pointers and efficient memory usage through a recursive programming approach. The package allows regulation of the desired depth of the tree. Local LP-problems can be solved by any commercial or non-commercial LP-solver. The MILP-machine uses currently the ddlprs-solver of IMSL. We demonstrate that large-scale MILP-problems can be solved by the algorithm quickly. 3.1 The constraint set 3.1.1 The liquidity and inventory constraints. Non-negativity of cash within the planning horizon is controlled by the following relation. " # TþH N X X ð pit ðð1 þ tÞxit 2 ð1 2 tÞyit ÞÞ þ C t 2 Dt 2 C t21 2 Dt21 ¼ 0 ð3:3Þ t¼T

i¼1

New debt is allowed up to a limit specified in the debt constraint. Non-negativity of inventories – excluding short selling – is defined as: I it $ 0; ;i; t [ ðT; T þ H  ðnon-negativity of inventoriesÞ

ð3:4Þ

3.1.2 Upper limit on debt ð1 þ r=365Þ Dt 2 DDt # UB D ; ;i; t

ð3:5Þ

A non-zero debt limit (UB D) provides opportunities to finance extra purchases in cases

Case Small Cplex Minlp_machine Zone2 Depth first D. first þ SGC(0.1) D. first þ SGC(0.05) D. first þ SGC(20.05) Zone1 þ SGC(0.05) Zone1 þ SGC(0.1) Zone2 þ SGC(0.1) Zone3 þ SGC(0.1) Zone3 þ SGC(0.1) Zone6 þ SGC(0.1) Zone6 þ SGC(-0.1) Zone3 þ SGC(-0.1) Zone4 þ SGC(0.1) Zone5 þ SGC(0.1) Zone2 Zone3 Zone40 Zone6 Zone5 Medium Depth first Cplex Minlp_machine Zone1 Zone2 Zone3 Zone4 Zone5 Big Depth first Cplex Minlp_machine Zone1 Zone2 Zone3

Tree size {m, n, n_i, n_x, n_d} ¼ {27 48 24 24 0} Relaxed LP-solution Cplex-solution {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 0, 0} Pure depth first using maximum fraction

{R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 7} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 7} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 7}a {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 3} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 2} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 7} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 3} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 2} {m, n, n_i, n_x, n_d} ¼ {126 336 216 120 0} Relaxed lp-solution Pure depth first using maximum fraction Cplex-solution {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 5}a {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 1, 5}a {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 3} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 2} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 2} {m, n, n_i, n_x, n_d} ¼ {451 1232 720, 512 0} Relaxed LP-solution Pure depth first using maximum fraction Cplex-solution {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 5} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {1, 2, 4} {R_SRCH, Z_SRCH, Z_DEPTH} ¼ {0, 2, 0}

Optimal F

– 106

21.99376 21.99228

3,872 3,869 2,217 1,821 1,561 64 58 49 30 17 17 13 20 5 3 37 32 18 17 7

21.99228 21.99228 21.99228 21.99228 21.99228 21.99228 21.99228 21.99228 21.99228 21.98895 21.98895 21.98737 21.98737 21.98506 21.98506 21.96036 21.96036 21.96036 21.96036 21.96036

– 1,133 32

22.50568 22.50322 22.50290

48 54 18 12 7

22.50239 22.50239 22.45722 22.45722 22.45722

– 64,978 1,357

21.37743 21.37012 21.37010

85 25 7

21.36920 21.36851 21.35306

Notes: aThe parameters {R_SRCH, Z_SRCH, Z_DEPTH} refer, respectively, to the 0/1 switches ROOT_SEARCH and the depth of the tree (ZONE_DEPTH) at which a discrete search zone is imposed by the milp solver. For example, {R_SRCH, Z_SRCH, Z_DEPTH}={1, 1, 5} imposes a discrete zone around the relaxed LP-solution at the root of the tree, followed by a discrete zone at depth 5 of the tree. If the parameters are zero, a pure depth first search is carried out

Dynamic portfolio management 1531

Table I. Total number of LP-solutions (tree size) and optimal solution in the MINLP-machine of GHA vs Cplex in portfolio problems

K 34,9/10

1532 Figure 8. Trade-off between number of LP-solutions and optimal solution in the small problem

Figure 9. Trade-off between number of LP-solutions and optimal solution in the medium problem

Figure 10. Trade-off between number of LP-solutions and optimal solution in the big problem

where cash is insufficient and asset price predictions do not motivate financing by internal portfolio revenues. 3.1.3 Portfolio risk. SHAREX includes some of the most popular linear representations of risk (e.g. MAD of Konno and Yamazaki (1991)). The model based on the specification in O¨stermark (1991) is stated as follows: Gjþ1 X

h  . i  . pI ði Þt vI ði Þt wI ði Þ 2 l wI ðGj Þ þ dwI ðGj Þ G 2 lC t wI ðGj Þ þ dwI ðGj Þ G # 0;

i¼Gj

1533 ð3:6Þ

j ¼ {0; 1; . . . ; G 2 1}; ;t where G is the number of risk groups; I(), the array of indexes of reference assets ordered by the risk measure; Gj, the element number in the array I referring to the first asset I(Gj) belonging to risk group j; pI(i )t, the price of asset I (i ) at time t; vI(i )t, the inventory volume of asset I (i ); wI(i ), the risk coefficient of asset I (i ); wI ðGj Þ ; the average risk coefficient of assets belonging to risk group j; dwIðGj Þ , the spread of risk coefficients of assets belonging to risk group j; l, the nonnegative risk level specified by the user; and Ct is the end of period cash position at time t. The assets in the reference portfolio – those among which portfolio selection is made – are organized according to some criterion (variance, standard deviation, beta or one based on a higher moment) in risk groups with cardinality Gj, j ¼ {0; 1; . . . ; GÞ: The portfolio risk is computed as a linear combination of the risk levels of the individual assets. In equation (3.6) cash is used as a compensatory element in the diversity requirement. The amount invested in the particular assets provides the weights of the assets in the portfolio. Some risk measure (for example beta) may recognize the correlation (covariance) between the assets, whereas others (for example, variance) do not. If the chosen risk measure reflects the risk of the underlying asset in the particular financial market properly, the portfolio risk is rarely understated in equation (3.6). The representation may be considered a cautious approximation of the true portfolio risk. The larger the number of risk groups G, the smaller the proportion of total investment available for an individual asset and the smaller the compensation from the cash position. Intuitively, when G increases, risk is dampened through diversification, whereby cash becomes less important a risk moderator. Diversity control can be added to the basic risk constraint at t [ ðT; T þ H  as follows: T þt XX

pit I ij þ

i[Rk j¼T N X T þt X

Ct þ dit G $

pij þ C t

1 4G

Dynamic portfolio management

ð3:7Þ

i¼1 j¼T

As alternatives to the basic risk constraints, we consider linear min-max risk constraints inspired by the mean absolute deviation (MAD) model (cf. Speranza, 1993; Feinstein and Thapa, 1993). In MAD, risk is measured by the mean absolute deviation of the stock price (return) from the mean (target). In the risk-adjusted

K 34,9/10

min-max-formulation, the deviational slack variables (dit) are weighted by the mean risk of the current stock. T þt XX

r i ðp ij 2 pij ÞI ij 2

i[Rk j¼T

1534

1 1 Ct 2 dit # 0; 4G 4G

t [ ðT; T þ H Þ;

k [ {1; 2; . . . ; G}

ðMINMAXÞ ð3:8Þ Tþt XX

r i ðp ij 2 pij ÞI ij 2

i[Rk j¼T

1 b d it # 0; t [ ½T; T þ H ; k [ {1; 2; . . . ; G} Ct 2 4G 4G

ðrisk – adjusted MINMAXÞ ð3:9Þ The MAD-formulation focuses on the downside risk, recognizing the plausible assumption that the investor worries more about portfolio under-performance than over-performance. 3.1.4 Fixed transactions costs. The fixed costs are captured by the single non-linear constraint (3.2a) or by N ðH þ 1Þ linear constraints (3.2b) with N ðH þ 1Þ additional integer-valued (0/1) variables. For the non-linear constraint (3.2a) we note the following remarks. Remark 3.1. The on/off switches zit in (3.2a) are automatically 0/1 binary valued: xit . 0 ) zit ¼ 1 and xit ¼ 0 ) zit ¼ 0: In order to activate the fixed costs for a given transaction, say xit (or yit), the corresponding switch zit must be set to 1 in the minimum transactions cost restriction: xit ð1 2 zit Þ ¼ 0 iff xit . 0 ) zit ¼ 1: On the other hand, xit ¼ 0 ) zit ¼ 0; since a non-zero cost switch would reduce the objective function value unnecessarily by the amount f. Remark 3.2. The same on/off switch zit can be used for both purchases (xit) and sales ( yit) of the same asset at a given time point. By definition, xit and yit are linearly dependent, therefore never simultaneously included in the basis of a solution. By remark 3.1, the switches are binary-valued. Thus, ðxit þ yit Þð1 2 zit Þ ¼ 0 ) xit ð1 2 zit Þ ¼ yit ð1 2 zit Þ ¼ 0; since xit yit ¼ 0; ;i; t: Remark 3.3. The on/off switches zit can be included in a single constraint comprising all purchases (xit) and sales ( yit) over the planning horizon. From the non-negativity of w and by remarks 3.1 and 3.2, H X N X

ðxit þ yit Þð1 2 zit Þ ¼ 0 ) ð yit þ xit Þð1 2 zit Þ ¼ 0

;i; t:

t¼0 i¼1

Thus, the fixed cost is correctly activated for each transaction at each time point in [0,H ] through the on/off switches (zit) specified in the single bilinear equality restriction (3.2) and the objective function of POP. Remark 3.4. The bilinear equation (3.2a) is non-convex, hence a global solution to the non-linear programming problem cannot be guaranteed.

The Hessian formed from equation (3.2a) will have zeros on its main diagonal. Since the Hessian is symmetric by definition, all its principal minors cannot be positive as required by convexity (Pfaffenberger and Walker, 1976). In SuperGHA – the computational engine of SHAREX – genetic or random search around the feasible point returned by the non-linear solver can be done to improve the current solution or to ascertain that a satisfactory solution has been found. An initial feasible solution to the non-linear solver (FSQP) is determined as follows. Initially (3.2a) and the minimum cost requirement (3.10) are deactivated and the corresponding problem is solved as an ordinary LP. Next, constraints (3.2a) are (3.10) are augmented to the formulation and the solution vector is adjusted accordingly (ex post recognition of fixed costs). In case of insufficient cash, a debt position is taken with due recognition of interest payments. Next, the feasible non-linear starting point is submitted to the non-linear solver. It turns out that FSQP usually returns a solution where the small transactions in the initial (LP-based) solution have been eliminated, precisely as desired. In a small fraction of cases FSQP has run into numerical problems. In these cases all transactions for the day are withdrawn and the position remains unchanged to the next market day. For the linear constraints (3.2b) we note the following remarks. Remark 3.5. The on/off switches zit in (3.2b) are not automatically 0/1 binary valued. The LHS of the constraint set (3.2b) is scaled by L . 0: For L sufficiently large, LHS [ ½0; 1Þ will hold. Any particular constraint of the set may be satisfied by zit $ 0; since xit, yit $ 0 by definition. Thus, the 0/1 values can only be guaranteed by declaring the cost switches as binary-valued integers in the problem formulation. The presence of both xit and yit in (3.2b) is justified by remark 3.2. 3.1.5 The minimum transactions cost requirement. The minimum transactions cost requirement in POP is stated as follows:

tpit ð yit þ xit Þ þ fzit þ dm it $ mzit ; ;i; t

ð3:10Þ

Remark 3.6. The constraint set recognizes the variable and – through the cost switches – the fixed transactions costs separately for each asset at each time point in [0, H ]. Assume that at some time point t a purchase of asset i is made, xit . 0: Then yit ¼ 0 by remark 3.2 and zit ¼ 1 by Remark 3.1. Constraint (3.10) ensures that the minimum transactions cost m is paid. Due to the penalization in the objective function, the minimum cost deviation dit is nonzero only if necessary. 4. Empirical evidence 4.1 Data description Our database consists of daily stock index and stock price series for the Finnish stock exchange over the time period 1 February 1997-16 September 2002. The indexes are corrected for stock splits and dividends (www.hexgroup.com). The effects of different transformations (differentiation, log-transformation, standardization) have been examined in various tests. Theoretically, different transformations may be optimal for different stock series and different time series models, yet we have used the same transformation setting for all series and all methods in a run. The important issue of specifying the optimal transformation – perhaps using artificial intelligence – is left for future research. Diversification is a somewhat controversial issue in thin and infrequently traded stock markets, especially when considering different buy and hold strategies.

Dynamic portfolio management 1535

K 34,9/10

1536

For example, Statman (1987) argued that at least 30 stocks are needed to achieve an optimal risk spread. When considering strategies allowing for daily rebalancing, as in the present study, a critical issue is the number of stocks in the reference set, i.e. the size of the universe in which portfolio selection is made (the reference stocks). In order to shed some light on this issue, we will subsequently compare the portfolio performance of the selected strategies in different sized universes U in Helsinki stock exchange, jU j ¼ {20; 36; 40}: The stocks are ranked in decreasing order of trading volume over the time period of available data, after which the top-most {20, 36 or 40} traded stocks are selected as reference stocks. At the same time, the infrequency bias of Helsinki Stock Exchange will be controlled as far as possible. We note that, during the time period of the test, of all 135-140 stocks listed on the exchange, the top 20 most traded stocks represent 91.3 percent of the total trading volume on HSE. The corresponding figure for the 40 most traded stocks is 96.3 percent. Thus, the tradability of the portfolios formed in a larger universe of HSE-stocks than 40 could be unduly decreased. Some 20 of the stocks included in the tests were introduced in the exchange between 1 February 1999-21 February 2000, thus the size of the reference portfolio is gradually increased within this time interval to 36 and 40, respectively. SHAREX contains facilities for dynamically changing the size of the reference portfolio, e.g. due to introduction to the exchange, stock splits or spin offs. By the end of the test period (16 September 2002), four of the firms originally included in the data set had withdrawn from the exchange. These firms were excluded from the data set pertaining to Table II. In the table, we summarize some statistical properties of the daily stock returns between 21 February 2000 and 16 September 2002, when all remaining 36 stocks are active (cf. Appendix, Table AI). The significant abnormality of the stock returns is obvious on this small exchange. HEX experienced an extremely strong upward surge beginning on the early 2000. By the end of the same year, the technology bubble burst and the market drifted into a recession that prevailed throughout the rest of the study period. The heavy miss-pricing of technology firms probably has distorted the pricing of the mostly traded stocks of other branches as well. 4.2 Performance of the recursive portfolio management system The performance of the recursive portfolio management system has been the subject of extensive testing over the past few years. In this section we will give a brief summary of the most promising results obtained with Finnish data using different reference portfolio sizes. We have used the following test periods: . 1 February 1999-19 October 2000 (435 market days) . 1 October 2000-31 March 2001 (109 market days) . 1 February 1999-16 September 2002 (910 market days) . 7 December 2001-28 June 2002 (135 market days). The time intervals are motivated by the different economic conditions during these periods. A strong bull market was experienced on the stock exchange in the spring 2000. The first and third time periods span the technology boom. In addition, the third time period also covers both the technology boom and the severe downturn period – due to heavily falling technology shares – beginning in the middle of 2000 and

0.0007 0.0001 0.0007 0.0006 0.0004 0.0003 0.0007 0.0007 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0004 0.0001 0.0001 0.0001 0.0003 0.0004 0.0002 0.0001 0.0001 0.0005 0.0002 0.0001 0.0003 0.0003 0.0006 0.0001 0.0001 0.0001 0.0001 0.0002

0.0264 0.0103 0.0264 0.0242 0.0212 0.0178 0.0262 0.0266 0.0087 0.0109 0.0076 0.0100 0.0090 0.0111 0.0113 0.0111 0.0106 0.0193 0.0116 0.0090 0.0121 0.0175 0.0193 0.0153 0.0101 0.0114 0.0226 0.0126 0.0120 0.0183 0.0171 0.0243 0.0102 0.0100 0.0100 0.0091 0.0138

20.0016 0.0001 20.0022 20.0024 20.0017 20.0015 20.0010 20.0021 0.0002 0.0004 0.0002 20.0002 20.0001 0.0001 20.0002 20.0001 20.0001 20.0009 20.0001 20.0001 20.0001 20.0009 20.0015 20.0006 20.0002 0.0001 20.0018 20.0001 20.0001 20.0014 20.0011 20.0033 0.0001 0.0000 0.0000 20.0002 20.0007

33.3269 3.3872 22.2305 16.0666 13.8115 1.9598 48.3962 6.8980 4.0199 2.8074 2.8526 1.2830 15.3205 37.7726 2.1684 6.1035 2.1383 3.0902 9.5411 6.9246 5.7218 97.5152 1.9715 39.1412 1.0285 5.4114 1.1897 2.0135 2.7398 26.0725 7.1824 2.0157 4.2000 1.6836 1.0963 26.8474 3.0113

Kurtosis 23.1398 20.1510 22.2184 21.1612 21.3668 20.0061 22.9156 20.6793 0.1556 0.2657 20.0652 20.2285 21.3645 21.2567 20.0793 20.5759 20.1857 20.2961 20.6239 20.7672 0.3251 28.1991 20.0610 2.6075 20.1602 0.0356 0.1299 20.4220 20.2152 22.2219 20.8726 0.2077 0.5869 0.1826 20.2002 22.4735 20.2523

Skewness

Notes: aCritical values: 10 percent – 4.605; 5 percent – 5.991; 1 percent – 9.21

Aldata Solution Oyj Amer-yhtyma¨ Oyj A Comptel Oyj Elektrobit Group Oyj Eimo Oyj A Elisa Communications Elcoteq A F-Secure Oyj Fortum Oyj Oyj Hartwall Abp A Huhtama¨ki van Leer KCI Konecranes Inter Kesko Oyj B Kemira Oyj Metso Oyj Metsa¨-Serla Oyj B Nordic Baltic H FDR Nokia Oyj Nokian Renkaat Oyj Orion-yhtyma¨ B Outokumpu Oyj Pohjola D Perlos Oyj Raisio Yhtym Vaih-os Rautaruukki Oyj K Sampo A Sonera Oyj Stora Enso Oyj A Stora Enso Oyj R Teleste Oyj Tietoenator Oyj TJ Group Tamro Oyj Uponor Oyj UPM-Kymmene Oyj Wa¨rtsila¨ Oyj Apb B Hex

Sampe var.

Std. dev

Mean 0.3890 0.1096 0.3474 0.3131 0.2844 0.1512 0.4753 0.2720 0.0906 0.0898 0.0659 0.0715 0.1273 0.2281 0.0910 0.1247 0.0920 0.1990 0.1522 0.1018 0.1408 0.2842 0.1754 0.2822 0.0712 0.1344 0.1899 0.1039 0.1169 0.2918 0.1838 0.2005 0.1014 0.0823 0.0784 0.1493 0.1389

Range 20.3032 20.0577 20.2600 20.1988 20.2004 20.0934 20.3450 20.1877 20.0389 20.0380 20.0364 20.0388 20.0859 20.1265 20.0504 20.0804 20.0512 20.1129 20.0885 20.0664 20.0707 20.2299 20.0878 20.0950 20.0373 20.0750 20.0905 20.0632 20.0719 20.2052 20.1303 20.0928 20.0459 20.0393 20.0370 20.0974 20.0757

Minimum 0.0857 0.0519 0.0874 0.1142 0.0840 0.0578 0.1303 0.0844 0.0517 0.0518 0.0295 0.0327 0.0414 0.1016 0.0406 0.0443 0.0408 0.0862 0.0637 0.0354 0.0701 0.0544 0.0875 0.1872 0.0339 0.0594 0.0994 0.0407 0.0449 0.0866 0.0534 0.1078 0.0555 0.0430 0.0414 0.0519 0.0632

Maximum 30861.447 310.307 13789.107 7071.372 5319.185 103.067 63761.310 1326.337 436.214 219.058 218.813 49.776 6498.077 38454.533 126.840 1035.201 126.390 265.656 2484.497 1349.850 889.857 262379.727 104.696 41839.257 31.140 785.899 39.787 127.894 206.393 18770.539 1465.987 113.651 510.306 79.639 36.552 19997.648 250.161

Kiefer-Salmona *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***

Significance

Dynamic portfolio management 1537

Table II. Key statistics for the returns of the 36 assets the reference database between 21 February 2000 and 16 September 2002 (644 observations)

K 34,9/10

1538

extending over the time span of the available data. The second and fourth time periods were included in order to verify the performance of SHAREX during a recession, when entering the stock market at an unfavourable time point. The top 20 most traded stocks belong mainly to the technology branch, hence a test of SHAREX with these stocks is particularly interesting. The performance of the strategy of Table IV is not shown graphically, as the overall performance of the system resembles that for the fourth period. The portfolio performance during this test period is still quite acceptable, due to more favourable market conditions than those for the fourth test period (see the results in the latter part of Table IV). The historical data available at the decision time point is used for model estimation. In the estimation we will apply both fixed and growing estimation intervals. The alternative strategies are presented in Tables III-V. Whereas the growing window uses all data between 1 February 1997 and the current time point, the fixed window uses at most the Forecast model

Average of KNN[3]-Fourier[3], Average of GARCH KNN[5]-Fourier[2] and TREND and AFGARCH Logarithmed and standardized Differenced and Data transformation standardized Risk model Basic model, no diversity control Risk measure Variance Target risk level 1.4 2 1.4 Window length Fixed 200 Fixed 80 Fixed 200 MAPE limit 0.1 0.6 0.1 Portfolio size 40 40 40 Group size (G) 3 5 3 Group mode Grouping by risk level Equal risk levels Grouping by risk level Forecast horizon (H) 2 2 2 Objective function weights 10,0.9,0.1 10,0.9,0.1 10,0.9,0.1 Terminal assets 100142.07 98586.20 98139.27 Terminal ROA (percent) 102.75 97.04 95.77 Terminal diversity 10 2 9

Table III. Strategies for the time period 1 February 1999-19 October 2000 (435 market days)

Fuzzy combination of ARCH and AFARCH Differenced and standardized

Notes: A fixed window containing 200 and 80 observations, respectively, is used in the above test. These sizes are based on preliminary tests. H denotes the forecast horizon (in days) and G the number of risk groups in which the reference stocks are categorized in the optimisation stage. The initial cash investment and debt limit in all tests are 16818.79 e (FIM 100000) and 1681.88e (FIM 10000) respectively. If the debt limit is exceeded, the excess debt penalty will be activated. MAPE limit is the critical level of prediction accuracy. If MAPE exceeds this level for a stock, then the predictions produced by the model will be substituted by the current observation. The objective function weights are used to weigh the individual components of the wealth objective (terminal inventory value, terminal cash and profits after transaction costs in each period). Due to the huge computational effort involved, SuperGHA is activated with a low number of iterations for each stock series and each forecasting model. The local optimiser used is similar to the BHHH-algorithm of Berndt et al. (1974). The number of individuals in the genetic population is limited to 1 only. Therefore, only occasional mutations are possible (cf. O¨stermark, 2000c)

Forecast model Data transformation Risk model Risk measure Target risk level Window length MAPE limit Portfolio size Group size (G) Group mode Forecast horizon (H) Objective function weights Terminal assets Terminal ROA (percent) Terminal diversity

Average of GQARCH-M[211] and GJR-M[211] Differenced and standardized Basic model, no diversity control Variance 0.6 0.7 Fixed 200 0.6 40 3 Equal risk levels Grouping by risk level 2 10,0.9,0.1 10,0.6,0.4 20588.01 20112.534 46.10 40.77 4 4

Dynamic portfolio management 1539

Notes: A fixed window containing 200 and 80 observations, respectively, is used in the above test. These sizes are based on preliminary tests. H denotes the forecast horizon (in days) and G the number of risk groups in which the reference stocks are categorized in the optimisation stage. The initial cash investment and debt limit in all tests are 16818.79 e (FIM 100000) and 1681.88e (FIM 10000) respectively. If the debt limit is exceeded, the excess debt penalty will be activated. MAPE limit is the critical level of prediction accuracy. If MAPE exceeds this level for a stock, then the predictions produced by the model will be substituted by the current observation. The objective function weights are used to weigh the individual components of the wealth objective (terminal inventory value, terminal cash and profits after transaction costs in each period). Due to the huge computational effort involved, SuperGHA is activated with a low number of iterations for each stock series and each forecasting model. The local optimiser used is similar to the BHHH-algorithm of Berndt et al. (1974). The number of individuals in the genetic population is limited to 1 only. Therefore, only occasional mutations are possible (cf. O¨stermark, 2000c)

Table IV. Strategies for the time period 1 October 2000-31 March 2001 (109 market days)

Forecast model OLS-regression of stock returns on market return Data transformation Differenced and standardized Risk model Basic model with diversity control activated (equation 3.7) Time period 1 February 1999-16 September 2002 7 December 2001-28 June 2002 Risk measure mean return/standard deviation Target risk level 2.3 1.7 Window length Fixed 80 MAPE limit 0.3 Portfolio size 36 20 Group size 3 Group mode Grouping by risk level Forecast horizon 2 Objective function weights 10,0.9,0.1 Terminal assets 52895.57 14224.06 Terminal ROA (percent) 31.33 229.70 Terminal diversity 4 4

Table V. Strategies for the time periods 1 February 1999-16 September 2002 (910 market days) and 7 December 2001-28 June 2002 (135 market days)

data points specified by the window. The number of observations, like all issues directly related to time series modelling, is a non-trivial problem. When widening the window, the risk for various persisting disturbances (regime shifts, outliers, changes in model order, model type, etc.) increases as well. When narrowing the window, the risk for biased

K 34,9/10

1540

estimators and imprecise estimates increases. The fact remains that, with a window of reasonable length, say between 50 and 200 data points (a fairly long time span in daily business) the exclusion of statistical estimation problems (observed through non-white noise) cannot be guaranteed (cf. Table II), yet the purely numerical estimation problems can be effectively circumvented. We are left with time series models of not necessarily top statistical quality, but which nevertheless can be expected to render economically valuable hints of the future time path of the data, especially in stable or rising market conditions. Portfolio performance is measured by return on assets (ROA) and the nominal value of assets at the current time point t. Return on assets is defined as follows: 1 Pt t 21; P0

 ROA ¼

where P0 is the initial (cash) investment, t is the current time point (iteration) and Pt is the nominal value of the portfolio at time t. In the strategies of Table V the heuristic TREND model was used in order to isolate the effect of fixed transactions costs in rising vs falling markets. Preliminary tests indicated that ex ante recognition of fixed costs is not critical to the performance of SHAREX in stable or rising market conditions. Therefore, in all runs pertaining to Tables III-IV and the long test in Table V (period 1 February 1999-16 September 2002), fixed transactions costs were deducted ex post from cash prior to the next iteration. This procedure may occasionally lead to minor loans taken. The correctness of the procedure was checked against three financial institutions as explained below. In falling markets, explicit recognition of fixed transactions costs in a bilinear continuous or linear mixed-integer optimisation formulation becomes critical (cf. Figures 19 and 20), implying a significant increase in computational overhead. The test for the downturn period (7 December 2001-28 June 2002, Table V) was conducted interactively with three Finnish financial institutions, using the full non-linear programming formulation (3.1-3.2a, b). Non-convexity of the bilinear formulation does not seem to distort

Figure 11. Performance of recursive strategies (size of reference portfolio ¼ 20) return on assets

Dynamic portfolio management 1541

Figure 12. Nominal asset values (Euro)

Figure 13. Diversity of fuzzy combination strategy

Figure 14. Diversity of best of ARCH and AFARCH strategy

K 34,9/10

1542

Figure 15. Performance of recursive strategies (size of reference portfolio ¼ 40) return on assets

Figure 16. Nominal asset values (Euro)

the solution process (cf. remark 3.4). Daily transactions were transmitted each market day electronically to EQOnline Ltd (a Finnish investment company) and two Finnish ˚ landsbanken Ltd and Nordea Ltd – each having different commercial banks –A fixed/variable transactions cost rates. Each institution maintained its own virtual portfolio, which was periodically matched against those maintained by SHAREX. Figures 11-18 illustrate the portfolio performance of the strategies in Table III for the ARCH, AFARCH and Fuzzy Combination forecasts obtained with 20 and 40 reference stocks, respectively, and the buy-and-hold (BH) strategy for the HEX-portfolio. We tested various alternative time series models over both periods. Only the best ones are reported here. Note that, with 20 reference stocks, the

Dynamic portfolio management 1543

Figure 17. Diversity of fuzzy combination strategy

Figure 18. Diversity of best of ARCH and AFARCH strategy

noncombination strategy selecting the best of ARCH and AFARCH is superior. With 40 reference stocks, again, the fuzzy combination approach is superior to the competing strategies. In both cases, the recursive models outperform the buy and hold strategy. Interestingly enough, the level of diversity seems to be linked to the nominal value of assets in a rather constant manner. When the nominal wealth of the portfolio increases, the level of diversity is increased. This is an appealing pattern of the system. The KNNand TREND-performance is not shown graphically for the above strategies, however, these models outperform the market clearly, even though they are inferior to the more profound time series models discussed above. Figures 19-22 show the portfolio performance of the strategies in Table V. The results indicate that, even with a simple forecast model, the performance of SHAREX is acceptable. The effect of an explicit recognition of fixed transactions costs through the bilinear constraint (3.2a) is indicated by the FSQP curve in Figures 19 and 20. When recognizing the costs ex post, the performance of the recursive system is inferior to that of the market. When activating the bilinear constraint (3.2a) and the minimum cost

K 34,9/10

1544

Figure 19. Performance of recursive strategies (size of reference portfolio ¼ 36) return on assets

Figure 20. Nominal asset values (Euro)

constraint (3.10), the recursive strategy outperforms the market even on such unfavourable market conditions. The main conclusion of the tests is that, irrespective of the starting point of investment activity, the recursive portfolio management system seems to outperform the market index. 5. Conclusions In this paper we have presented a recursive multi-period portfolio management system (SHAREX) and tested its performance on empirical data. We demonstrated that

Dynamic portfolio management 1545

Figure 21. Performance of recursive strategies (size of reference portfolio ¼ 36) return on assets

Figure 22. Nominal asset values (Euro)

SHAREX seems to outperform the buy and hold -strategy on the Finnish stock market in both rising and falling stock price conditions. Naturally, the potential of a technical portfolio system is best exploitable under favorable market conditions. The existence of several alternative parameterisations that can outperform the buy and hold strategy further increase confidence in the recursive system. The results call for extensive empirical tests in future research with the aim to isolate robust parameterisations for various market conditions. Monte Carlo studies should be made in order to determine

K 34,9/10

1546

the effect of different entry/exit points on portfolio performance and to ascertain the results obtained here. The tests should be extended to other stock markets on a global scale. The SuperGHA algorithm was used for the numerical computations. A number of robust engines for matrix algebra, mathematical programming and numerical calculus have been integrated with SuperGHA. The engines expand its scope as a general-purpose algorithm for mathematical programming. An important direction for future research is to exploit the potential of fuzzy set theory in both forecasting and portfolio optimisation. Fuzzy set theory may contribute to time series modelling on at least three levels: (1) through providing a platform for judgmental corrections of forecasts; (2) by a deeper investigation of alternative fuzzy non-linear forecast combination methods; and finally (3) through new fuzzy non-linear time series models geared at capturing the imprecision that usually distorts classical time series approaches severely. Another important topic is the empirical testing of alternative MILP-formulations of the portfolio optimisation problems. References Adcock, C.J. and Meade, N. (1994), “A simple algorithm to incorporate transactions costs in quadratic optimisation”, European Journal of Operational Research, Vol. 79, pp. 85-94. Alexander, S.S. (1964), “Price movements in speculative markets: trends or random walks”, The Random Character of Stock Market Prices, MIT Press, Vol. 2, pp. 338-72. Aoki, M. (1988), “State space models for vector-valued time series with random walk components”, paper presented at the The Eighth International Symposium on Forecasting, Amsterdam, June. Berndt, E.B., Hall, R., Hall, R. and Hausman, J. (1974), “Estimation and inference in non-linear structural models”, Annals of Economic and Social Measurement, Vol. 3, pp. 653-65. Bloomfield, P. (1976), Fourier Analysis of Time Series. An Introduction, Wiley, New York, NY. Bollerslev, T. (1986), “Generalized autoregressive conditional heteroskedasticity”, Journal of Econometrics, Vol. 31, pp. 307-27. Booth, G.G. and Koutmos, G. (1998), “Volatility and autocorrelation in major European stock markets”, The European Journal of Finance, Vol. 4, pp. 61-74. Chang, T-J., Meade, N., Beasley, J.E. and Sharaiha, Y.M. (2000), “Heuristics for cardinality constrained portfolio optimisation”, Computers and Operations Research, Vol. 27 No. 13, pp. 1271-302. Engle, R.F. (1982), “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation”, Econometrica, Vol. 50, pp. 987-1007. Engle, R.F. and Ng, V.K. (1993), “Measuring and testing the impact of news on volatility”, Journal of Finance, Vol. 48, pp. 1749-78. Engle, R.F. and Bollerslev, T.P. (1986), “Modeling the persistence of conditional variances”, Econometric Review, Vol. 5, pp. 1-50. Fama, E.F. and Blume, M.E. (1966), “Filter rules and stock market trading”, Journal of Business, Vol. 39, pp. 226-41.

Fama, E.F. and French, K.R. (1998), “Value versus growth: the international evidence”, Journal of Finance, Vol. 53, pp. 1975-98. Farmer, J.D. and Sidorowich, J.J. (1987), “Predicting chaotic time series”, Physical Review Letters, Vol. 59, pp. 845-8. Feinstein, C.D. and Thapa, M.N. (1993), “A reformulation of a mean-absolute deviation portfolio optimisation model”, Management Science, Vol. 39, pp. 1552-3. Feiring, B.R., Wong, W.L., Poon, M. and Chan, Y.C. (1994), “Portfolio selection in downside risk optimisation approach – application to the Hong Kong stock market”, International Journal of Systems Science, Vol. 25, pp. 1921-9. Fiordaliso, A. (1998), “A non-linear forecasts combination method based on takagi-sugeno fuzzy systems”, International Journal of Forecasting, Vol. 14, pp. 367-79. Geweke, J. (1986), “Exact inference in the equality constrained normal linear regression model”, Journal of Applied Econometrics, Vol. 1, pp. 127-41. Glosten, L.R., Jagannathan, R. and Runkle, D. (1989), “On the relation between expected value and the volatility of the nominal excess return on stocks”, Journal of Finance, Vol. 48, pp. 1779-801. Glover, F. (1977), “Heuristics for integer programming using surrogate constraints”, Decision Sciences, Vol. 8 No. 1, pp. 156-66. Glover, F. (1997), A Template for Scatter Search and Path Relinking, in Hao, J.K., Lutton, E., Ronald, E., Schoenauer, M. and Snyers, D. (Eds), Lecture Notes in Computer Science, pp. 1-50. Goldfeld, S.M. and Quandt, R.E. (1972), Nonlinear Methods in Econometrics, North Holland Publ. Co, Amsterdam, pp. 5-9. Gustafsson, F. (2000), Adaptive Filtering and Change Detection, Wiley Interscience, New York, NY. Hagerud, G.E. (1997), “Specification tests for asymmetric GARCH”, working paper, Stockholm School of Economics. Holland, J.H. (1975), Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann Arbor, MI. Hsia, C-C., Fuller, B.R. and Chen, B.Y.J. (2000), “Is beta dead or alive?”, Journal of Business Finance and Accounting, Vol. 27 Nos 3/4, pp. 283-311. IMSL STAT/Library (1987), FORTRAN Subroutines for Statistical Analysis, IMSL Inc. Joines, J. and Houck, C. (1994), “On the use of non-stationary penalty functions to solve non-linear constrained optimisation functions with GA’s”, Proceedings of the Evolutionary Computation Conference, Poster Sessions, IEEE World Congress on Computational Intelligence. Orlando, FL, pp. 579-84. Kodogiannis, V.S. (2000), “Comparison of advanced learning algorithms for short term load forecasting”, Journal of Intelligent and Fuzzy Systems, Vol. 8, pp. 243-89. Konno, H. and Yamazaki, H. (1991), “Mean-absolute deviation portfolio optimisation model and its applications to Tokyo stock market”, Management Science, Vol. 37, pp. 519-31. Konno, H., Shirakawa, H. and Yamazaki, H. (1993), “A mean-absolute deviation-skewness portfolio optimisation model”, Annals of Operations Research, Vol. 45, pp. 205-20. Lawrence, C., Zhou, J. and Tits, A. (1997), User’s Guide for CFSQP Version 2.5: A Code for Solving (Large Scale) Constrained Non-linear (Minimal) optimisation Problems, Generating Iterations Satisfying All Inequality Constraints, Institute for Systems Research, university of Maryland, College Park, MD.

Dynamic portfolio management 1547

K 34,9/10

1548

Leung, M.T., Daouk, H. and Chen, A-S. (2000), “Forecasting stock indeces: a comparison of classification and level estimation models”, International Journal of Forecasting, Vol. 16, pp. 173-90. Manderick, B., de Weger, M. and Spiessens, P. (1991), “The genetic algorithm and the structure of the fitness landscape”, Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan Kaufmann, La Jolla, CA. Markowitz, H. (1952), “Portfolio selection”, Journal of Finance, Vol. 7, pp. 77-91. Marney, J.P., Fyfe, C., Tarbert, H. and Miller, D. (2001), “Risk adjusted returns to technical trading rules: a genetic programming approach”, Computing in Economics and Finance, pap. no 147. Milhøj, A. (1987), “A conditional variance model for daily deviations of an exchange rate”, Journal of Business and Economic Statistics, Vol. 5, pp. 99-103. Mulvey, J.M. and Vladimirou, H. (1992), “Stochastic network programming for financial planning problems”, Management Science, Vol. 38, pp. 1642-64. Neeley, C.P. and Weller, P. (1999), “Technical trading rules in the European monetary system”, Journal of International Money and Finance, Vol. 18. Nelson, D.B. (1991), “Conditional heteroskedasticity in asset returns: a new apporach”, Econometrica, Vol. 59, pp. 347-70. ¨ Ostermark, R. (1991), “Vector forecasting and dynamic portfolio selection. Empirical efficiency of recursive multiperiod strategies”, European Journal of Operational Research, Vol. 55, pp. 46-56. ¨ stermark, R. (1999a), “A multipurpose parallel genetic hybrid algorithm for non-linear O nonconvex programming problems”, Theory of Stochastic Processes, , Vol. 5 Nos. 1/2, pp. 1-2, Proceedings of the Second International School on Actuarial and Financial Mathematics, Kiev, 8-12 June. ¨ Ostermark, R. (1999b), “Solving irregular econometric and mathematical optimisation problems with a genetic hybrid algorithm”, Computational Economics, Vol. 13 No. 2, pp. 103-15. O¨stermark, R. (1999c), “Solving a non-linear nonconvex trim loss problem with a genetic hybrid algorithm”, Computers & Operations Research, Vol. 26, pp. 623-35. O¨stermark, R. (1999d), Empirical Tests on Global Asset Returns with Parallel Geno-mathematical ˚ bo Akademi University, A ˚ bo. Programming, A ¨ Ostermark, R. (2000a), “A hybrid genetic fuzzy neural network algorithm designed for classification problems involving several groups”, Fuzzy Sets and Systems, Vol. 114 No. 2, pp. 311-24. ¨ stermark, R. (2000b), “A flexible genetic hybrid algorithm for non-linear mixed-integer O programming problems”, Evolutionary Optimisation, Vol. 1 Nos. 1, pp. 41-52. ¨ stermark, R. (2002), “Designing a superstructure for parametric search for optimal search O spaces in non-trivial optimisation problems”, Kybernetes, Vol. 31 No. 2, pp. 255-81. ¨ Ostermark, R. and Ho¨glund, R. (1991), “Automatic ARIMA modelling by the cartesian search algorithm”, The Journal of Forecasting, Vol. 10, pp. 465-76. ¨ stermark, R. and Aaltonen, J. (1992), “Recursive portfolio management. Large-scale evidence O from two Scandinavian stock markets”, Computer Science in Economics and Management, ˚ bo, 22-23 August, presented at NOAS 91 (Nordic Conference on Operations Research), A Vol. 5, pp. 81-103. ¨ stermark, R. and Ho¨glund, R. (1997), “Multivariate EGARCHX-modelling of the international O asset return signal response mechanism”, International Journal of Finance & Economics, Vol. 2 No. 3, pp. 249-62.

¨ stermark, R. and Saarinen, M. (1996), “A multiprocessor interior point algorithm”, Kybernetes, O Vol. 25 No. 4, pp. 84-100. ¨ Ostermark, R., Westerlund, T. and Skrifvars, H. (2000), “A non-linear mixed-integer multi-period firm model”, International Journal of Production Economics, Vol. 67, pp. 188-99. Pagan, A.R. and Schwert, G.W. (1990), “Alternative models for conditional stock volatility”, Journal of Econometrics, Vol. 45, pp. 267-90. Pantula, S.G. (1991), “Asymptotic distributions of unit-root tests when the process is nearly stationary”, Journal of Business and Economic Statistics, Vol. 10, pp. 229-35. Pettersson, F. (1994), “Mixed integer non-linear programming applied on pump configurations”, ˚ bo Akademi Dissertation, Process Design Laboratory, Faculty of Chemical Engineering, A ˚ bo. University, A Pfaffenberger, R. and Walker, D. (1976), Mathematical Programming for Economics and Business, Iowa State University Press. Potter, M.A. and De Jong, K.A. (1995), “Evolving neural networks with collaborative species”, Proceedings of the 1995 Summer Computer Simulation Conference, Ottawa, 24-26 July. Schwert, G.W. (1989), “Tests for unit roots: a monte carlo investigation”, Journal of Business and Economic Statistics, Vol. 7, pp. 147-60. Sentana, E. (1995), “Quadratic ARCH models”, Review of Economic Studies, Vol. 62, pp. 639-61. Simaan, Y. (1997), “Estimation risk in portfolio selection: the mean variance model versus the mean absolute deviation model”, Management Science, Vol. 43, pp. 1437-46. Speranza, M.G. (1993), “Linear programming models for portfolio optimisation”, Finance, Vol. 14, pp. 107-23. Statman, M. (1987), “How many stocks make a diversified portfolio?”, Journal of Financial and Quantitative Analysis, Vol. 1 Nos. 1, pp. 41-52. Takagi, T. and Sugeno, M. (1985), “Fuzzy identification of systems and its application to modelling and control”, IEEE Transactions on Systems, Man and Cybernetics, Vol. 15, pp. 116-32. Westerlund, T., Pettersson, F. and Grossmann, I.E. (1994), “Optimisation of Pump Configurations as a MILP Problem”, Computers and Chemical Engineering, Vol. 18 No. 9, pp. 845-58. Westerlund, T., Skrifvars, H., Harjunkoski, I. and Po¨rn, R. (1998), “An extended cutting plane method for a class of non-convex MILP problems”, Computers and Chemical Engineering, Vol. 22, pp. 357-68. Yoshimoto, A. (1996), “The mean-variance approach to portfolio optimisation subject to transactions costs”, Journal of the Operations Research Society of Japan, Vol. 39, pp. 99-117. ¨ Zakoıan, J.M. (1994), “Threshold heteroskedastic models”, Journal of Economic Dynamics and Control, Vol. 18, pp. 931-55.

Further reading ¨ stermark, R. (2000), “A flexible multi-computer algorithm for elementary matrix operations”, O Computers & Operations Research, Vol. 27, pp. 245-68. ¨ Ostermark, R. (1992), “Solving a linear multi-period portfolio problem by interior point methodology”, Computer Science in Economics and Management, Vol. 5, pp. 283-302. Orchard-Hays, W. (1968), Advanced Linear Programming Computing Techniques, McGraw-Hill, New York, NY.

Dynamic portfolio management 1549

K 34,9/10

Appendix

Lot size

1550

Table AI.

Aldata Solution Oyj Amer-yhtyma¨ Oyj A Comptel Oyj Elektrobit Group Oyj Eimo Oyj A Elisa Communications Elcoteq A F-Secure Oyj Fortum Oyj Oyj Hartwall Abp A Huhtama¨ki van Leer KCI Konecranes Inter Kesko Oyj B Kemira Oyj Metso Oyj Metsa¨-Serla Oyj B Nordic Baltic H FDR Nokia Oyj Nokian Renkaat Oyj Orion-yhtyma¨ B Outokumpu Oyj Pohjola D Perlos Oyj Raisio Yhtym Vaih-os Rautaruukki Oyj K Sampo A Sonera Oyj Stora Enso Oyj A Stora Enso Oyj R Teleste Oyj Tietoenator Oyj TJ Group Tamro Oyj Uponor Oyj UPM-Kymmene Oyj Wa¨rtsila¨ Oyj Abp B

100 200 25 100 100 50 50 50 200 50 50 100 100 500 100 500 500 50 200 100 200 200 50 500 500 100 50 100 100 50 20 100 500 100 100 100

Code ALD1V AMEAS CTL1V EBG1V EIMAV HPHAV ELQAV FSC1V FUM1V HARAS HVL1V KCI1V KESBV KRA1V MEO1V MESBS NBH1V NOK1V NOR1V ORIBS OUT1V POHDV POS1V RAIVV RTRKS SAMAS SRA1V STEAV STERV TEL1V TIE1V TJT1V TRO1V UNR1V UPM1V WRTBS

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Enterprise resource planning competence centres: a case study

Enterprise resource planning

Annika Granebring and Pe´ter Re´vay Informatics/Computer Science, School of Business, Ma¨lardalen University, Va¨stera˚s, Sweden

1551

Abstract Purpose – To describe the establishment of a Swedish enterprise resource planning (ERP) competence centre (CC), with assistance from Finnish colleagues and modelled on their CC. Design/methodology/approach – By outlining the input of Finnish colleagues and using the concept of the Finnish ERP CC, develops the history of the establishment of the Swedish model during 2001-2003. Findings – Following a European IT consultancy company building a new Baan CC in Sweden brings new lessons from the field. Reasons for failure were shown to be: a dropping market, high entry barriers and an underestimation of resources needed for building a new ERP CC. Barriers preventing success in the ERP business were found. Practical implications – Corporate templates, migrated sites combined with aggressive competitors and demanding price conscious customers have changed the ERP maintenance business and made the organisation hierarchical. Difficulties for new entrants in the ERP service business where customers pick and change supplier. Originality/value – Presented a unique case study that outlines and discusses the mentorship between Swedish and Finnish ERP CC initiatives. Keywords Resource management, Sweden, Cybernetics, Business environment, Manufacturing resource planning Paper type Case study

1. Introduction To extract value out of existing enterprise resource planning (ERP) implementation that are distributed all over the world, there are needs for a post-implementation service for operation and maintenance. This should come from a stable organization with staff of various skills to maintain the ERP and make users calm and assured of getting the support they need (Bergvall, 1995; Davenport, 2000, Ro¨nnba¨ck-O¨hrwall, 2002). The decision of companies to buy – not build – ERP maintenance attracts IT firms to enter this highly competitive and complex market. ERP maintenance has become marketable. This paper uses the term ERP competence centre (CC) to define a serving organization that provides skilled professionals, industrial experience, and proven tools that deliver measured and appropriate solutions during the ERP system life cycle (implementation and post-implementation). Large ERP vendors, like SAP or Baan have hundreds of professional service providers supporting their solutions (Hoch et al., 1999, p. 227) and have CC programs like Target Enterprise Competence Centre for Baan (Perreault and Vlasic, 1998). Navision Solution Centres are for example, available for serving customers of the MicroSoft ERP system Navision. ERP annual maintenance activities cost approximately 25 per cent of initial ERP implementation costs, and upgrades cost by about 30 per cent of the initial ERP implementation costs (Glass and Vessey, 1999; Carlino et al., 2000). New sales of

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1551-1562 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614803

K 34,9/10

ERP systems are modest (Kalling, 2003). Every ERP-implementation is financially important, opening a post-implementation market for ERP service suppliers. A travesty of Davenport’s classic article, “Putting the Enterprise into the Enterprise System” (1998) is “Putting the mass-customized Enterprise System into an interchangeable ERP Competence Centre”.

1552

The customer vs ERP service supplier’s gap was, in the 1990s, expensive for primary customers. Lack of standardization is a general way to limit competition and a problem for users (Porter, 1980). Standardization of ERP models as base for common way of working gives flexibility for organizational changes. Company-wide business and ERP templates bring about standardization which makes these gaps less vendor profitable and opens for customers to change vendor and outsource ERP maintenance. In about 20 per cent of IT outsourcing contracts, suppliers cannot make a reasonable profit and so they calculate for additional services and revenues not covered by the contract, resulting in hidden costs to the purchaser. Suppliers face significant costs and research speak of relational trauma for both parties (Kern et al., 2002, p. 61). To be commercially successful in supporting customers’ business and their ERP, as external player, focus has to be on “survival issues”. 1.1 Building a Swedish Baan CC The case study is a description of a computer consulting company, TietoEnator (TE), building a Swedish Baan CC, run as a project. Traditionally Baan business is strong in Finland with several key customers. TE is the market leader of Baan business in Finland with 80 Baan consultants. The Baan Business Solution Centre in Sweden was established in the summer of 2001. One new Baan consultant was hired the others were TE employees that should get their Baan education within the project. All Swedish consultants and managers were placed in the present seven Finnish teams. The teams became mixed with consultants from: Finland and Sweden both; Management Consulting (3 Finnish (Fi) consultants), Baan Consulting (10 Fi: 4 Swedish(s)), Baan Service Delivery1 SD1 Metso others (11Fi:3S), Baan SD2 Partek others (18Fi:3S), Baan Value Added Solutions (2Fi 1S), Value Added Solutions, Sweden (7S), Sales (2Fi,1S), and Administration&Marketing (3Fi,1S)

The objective target for TE was to be the leading partner in Baan business in Nordic countries with plans for later starting a Norwegian Baan Business Solution Centre in 2003. The main purpose was to build a Baan competence in Sweden with help from Finnish specialists. Potential TE Baan customers for the Swedish CC included the forestry industry with Finnish owned companies located in Sweden concentrating on the existing TE Finnish customers Metso[1] and Partek[2]. Additional customers had to be taken from strong Baan competitors like IBM or RKB Consulting. There were efforts for Swedish Baan staff to have a workload in Baan business and to be able to work in Baan environments separately and/or with the help of Finnish mentors. 1.1.1 Background project “Sweden 2002”. The Project “Sweden 2002” background is to build a Baan Business Solution Centre geographically located in Sweden (Va¨stera˚s, O¨rebro, Karlskoga) within a network including the present Finnish CC (Helsinki, Espoo, Turku, Vaasa, Tampere, Juva¨skyla¨). The project organisation presents a steering committee with the manager of Baan Business Solution Centre (Finland), the site manager (Sweden), the project manager (Finland), the key consultant

of Baan Business Solution Centre, the sales manager, the key contact person regarding Metso Karlstad, and the key consultant regarding new customers, all in Sweden. The implementation plan includes project phases: (1) Building up the foundation . Training – basic Baan, manufacturing and distribution, finance, tools; . Environments – Metso Karlstad, Partek, own development environments; . Personal Development Plans – interviews, plans, audits, actions. (2) Metso Paper Ab Karlstad – post implementation project, Metso Template. (3) Partek – taking part in baseline-service tasks in Finland, building up the baseline-service to Partek sites in Sweden. (4) Network between Finland and Sweden – mentors, group meetings, groupware tools (PODO), cooperation with technical staff, and change official language to English. Every phase/activity gets a responsible person and time. Metso Template Training was for instance one activity. Metso Paper Ab/ Karlstad template is a kind of company business ERP policy not really soft wear which includes also TE delivery management. In the network phase the naming of personal mentors is essential. Every Swedish consultant gets a Finnish consultant to work together with and to work individually with help from this personal mentor. The project was measured on: (1) completion of tasks including to the project; (2) Baan related work load of Swedish Baan group members (calendar time and customer invoicing); and (3) chargeable Baan work load for Swedish staff of the Baan group. Target was 50 per cent. Finnish consultants did valuable work putting their efforts in tutor work i.e. work with real customers’ cases with the help of tutors in Finland. The help of team leaders, continuous training and self-studies increase theoretical possibilities to work. In the twentieth week of 2002, the first auditing review – or knowledge transfer – took place resulting in a required report on the Baan knowledge level of the Swedish parts and the development areas. 1.1.2 The beginning and the bringing to an end. Every Swedish consultant had a plan with activities. These are some of the activities, some individual some common, for example, the Finance Baan consultant, employed (12/01/2001): . 2001-12-10 – 12 Finance Application consulting job at Ljungby. . Baan Tools training I course in Tampere 2002-02-11 – 13 with experienced tutor as main trainer. . Baan consulting team introduction to meet team leader, personnel mentor in finance, and project leader all three located at Tieto Enator in Jyva¨skyla¨ from 19-20 February 2002. The program included check with Baan installations in Finland, study of previous tasks with Baan in the financial area, and also making acquaintance with persons which the consult will contact in the future. . Tools II training 2002-03-06 – 7 in Tampere. . Metso Template training 2002-04-09 – 10 in Karlskoga.

Enterprise resource planning 1553

K 34,9/10

. . .

1554

Self study with Swedish colleges. Working with real cases in customer environments. External project leader course.

Initially, in Sweden it was difficult. Swedish wages were frozen whereas Finnish employees got their bonus wages. New common projects SWE/FIN did not happen. Euro currency switch meant the Baan business in Finland with Baan had to manage the EMU with the euro including the needing converting terms. The Discussed baseline-services and new implementation projects (Metso, Partek, Lexel, Avesta Polarit, Svedala, Fundia, ABB) did not happen thus causing certain desperation amongst Swedish project members. The need to get chargeable customers for the project was getting intense. Simultaneous as the task of building the foundation continued with some improvement in Swedish real job situation of the project – employees were dismissed in Sweden. In April 2002, Finnish management decided to cut down in Sweden. At a meeting (2002-04-02), the Swedish site manager announced that 8 of 16 consultants working in ¨ rebro and Va¨stera˚s were dismissed due to lack of work. In May 2002, Karlskoga, O Swedish Baan department consisted of a site manager, sales manager, two consultants in Va¨stera˚s and two consultants in Karlskoga and one in O¨rebro (not working with Baan). In February 2003, the remaining eight consultants were all relieved of their duties. 1.2 The ERP system ERP software is a recent technological innovation. Most firms that have adopted ERP are still in the midst of the innovation infusion process (Stratman and Roth, 2002, p. 622). In-house development of order, warehouse, and invoicing system still occurred. An example is their own standard system Pegasus, at ABB Infosystems AB, with in-house application development methods (Wedlund, 1997). In 2001 all Pegasus systems have been replaced by RaSU[3] at ABB. It is common strategy to handle ERPs complexity by going standard and towards central, fewer and bigger ERP installation sites. Technical implementations and off-customizing projects minimize customization. (Granebring and Re´vay, 2004, 2005). Baan Enterprise system supports complex manufacturers industry. Working on different platforms Baan is using the operating system (Unix or Microsoft) and the database system (Oracle, Informix, Sybase, DB2, or Microsoft SQL Server) independently. Intentias Movex has to run under OS/400 on IBM-server AS/400. IFS must have Oracle as database. Swedish ERP vendors: Intentia, IFS, and IBS, do not allow competitors. In-house consultants exclusively handle their post-implementation market. Baan applications are manufacturing, project, finance, service, process, distribution and transportation, customer enterprise, and interactive selling solutions. Baan finance application was weaker but has improved. Baan has an acknowledged fine product configurator. Baan Tools is the 4GL developing environment. The report and ad hoc part are solved through mirrored databases with report generators (Safari, Gentia, Crystal Report, Business Object, etc.) Special business processes like connection, production or subscription typically needs communication with other standardized applications. For integration with other applications XML web services and software BizTalk[4] can be used to communicate loose in and out of the ERP. Source code is the information base for enhancement, bug fixes and important in respect to system maintenance. Ability to reproduce a problem is fundamental to fix it. “Source code is king” (Singer, 2002). ABC-agreement with Baan gives partners access to source code.

2. Method This paper is based upon a case study. The method is characterized ad hoc participant-observation followed by reflection and analysis. The use of case studies is preferable in situations where processes and changes are in focus. It is common to gather information in different ways when using case studies (Patel and Davidsson, 2003). Yin (2003) recommends three principles to be followed to maximize the benefit of data collection. The first principle is that multiple sources of evidence should be used. This case study uses participant-observation, semi-structured interviews conducted with a number of Swedish and Finnish operational and strategically staff, documentation study of documentation created during the project (business documents, project plan, fortnightly reports of the project situation, the final project report evaluation reports, audits, specifications, mail correspondence) ranging from 2000 to 2002. The second principle is to create a case study database. We have done this by collecting information in bonders. The third principle is to maintain a chain of evidence to ensure quality control. Here the information is sorted in chronological order. The study contains elements of action research where the role of the researcher lies between the role of researcher and consultant (Avison et al., 2001, Gummesson, 2000). One of the authors of this paper has twice experienced the building of new Baan CCs and has also worked in two other Baan CCs. Two of those are now outsourced to an IBM operation and Cap Gemini consulting (ABB), and the CSC (Bombardier Transportation), one has stopped (the story of this paper), and one is still organized as an in-house ERP Profit Centre. By performing a case study, a theory can both be developed and tested, the selection of units that are to be investigated and can be made in different ways and both qualitative and quantitative information can be used. Deductive researchers hope to find data to match a theory, and inductive researchers hope to find a theory that explains their data (Merriam, 1998). The issue about objectivity is important like the choice regarding selection of information (Holme and Solvang, 1997). The researchers’ life stories, their own experiences, and preunderstanding are some factors that influence the questions that are asked and also the results that are generated (Alvesson and Deetz, 2000). This means that there do not exist any research that is neutral or devoid of value judgements. The researcher must keep a certain distance in the research process (Holme and Solvang, 1997). We have kept this in mind during the analysis process and we are of that opinion that we have been as objective as possible during the analysis of the process. The innovative theoretical approach has turned out to be very successful as a foundation for this paper. Primary and secondary data are operationalised on Rogers’s (1995 p. 207) innovation theory with five adaption characteristics applied on the over time process of establishing a ERP CC. (1) Relative advantage, expressed as an economic profitability in establishing a Baan CC in Sweden with support from Finnish specialist is compared to starting stand alone from scratch. (2) Compatibility issues in network building between CC: s in Finland and Sweden, despite differences in management, language and culture. (3) Strategies to put ERP service Complexity to order with tools and methods to increase customer orientation at a lower cost.

Enterprise resource planning 1555

K 34,9/10

1556

(4) Trialability defined as customers’ potential acceptance of efforts by novel service providers. (5) Observability of ERP service organisation types and how ERP maintenance business has matured over time. 3. Resources and processes What skills and knowledge are asked for in network ERP CC: s operating in different ERP implementations and environments? Knowing the ERP itself, knowing computing technology, knowing business practice and having social skills are needed for CC staff. Maintaining and upgrading ERPs demand a wider range of expertise than in-house software (Chang, 2004; Stratman and Roth, 2002, p. 603; Ng et al., 2002, p. 90). Bergvall (1995) and Ro¨nnba¨ck-O¨hrwall (2002) are used as references in the maintenance area. This case study and our own experience show that the following resources (e.g. profiles, training) need to be covered: (1) System Administrator; (2) Database Administrator; (3) Project Manager; (4) Logistic, Finance and DEM[5] consultant (UML, customer templates); (5) Developer (Tools, Crystal Reports, data dictionary, RUP); (6) Integrator (XML Services, BizTalk, Support in release III, IV, V etc. Tasks are implementated; (a) upgrade release or service pack are run as technical implementation project; (b) bug fixes patches, new objects and customizing – off-customization[6], integration within and with bolt-on functionality preferable in standardized manner, Tools – reports, menus, sessions, conversions, training – regular and special for various user groups, and documentation. The ERP CC use tools and follow specific rules as exemplified in the following four formal processes (e.g. help desk, change control). (1) The urgent problems process contains three levels; first-line, second-line and third-line. First-line: the end-user contacts key user who contact[7] single-point-of-contact (SPOC) – one telephone number. Staff of generalists decides if it is a problem or an issue and register errands into groupware system. Second-line: with authorized, certified personnel. Third-line: experts available from the ERP vendor international organization, e.g. Baan Central CC. Delic and Hoellmer (2000) argue that costs doubles four times per service level. An errad that costs 1,000 SEK in first-line costs 4,000 SEK in second-line and 16,000 SEK in third-line. Since the costs for solving problems increase per passed level it is important to solve errands early and track when errands pass service levels. Major part of business solved in first-line is optimal. (2) The Work Authorization Process applies to requested new work for application request, acquisition, and requested projects. If work is not properly authorized in writing, the provider is at risk to lose revenue by performing work outside contractual specifications. Some work is pre-authorized like corrective work, user support, anything in progress at cutover. For work taking more than 20 h an errand form is created, it is logged in the service request (SR) log,

budget allocations is performed, and decide whether in scope (baseline) or out of scope which requires a contract amendment. There should be information about planned. (3) Interruption in service in week and weekends by critical systems, major systems, and also interrupts that affects 10-50 users or affects the whole site. (4) Daily Service Review (DSR) Status Report Forum is a daily telephone conference review the delivery of service, which depends of the two parts: service exceptions and change control. The DSR-meeting is led by a Swedish employee and one representant per city and “stand-in” is appointed. 4. Analysis – an innovation theory perspective on the ERP CC building process 4.1 Relative advantage Establishing a Swedish ERP CC with help from a Finnish CC was expected to result in a number of large scale service production advantages. Providing local Baan competence for local customers in Sweden is not that important to customers as before. Big “off shore” global players located on another continent provide service around the clock imply that geographical close is not as critical. A more effective handling of errands and increased quality regarding the information due to formal logging of errands, which opens for statistics over incoming complains which makes them more noticeable. Professional handling increase efficiency, availability, flexibility, and quality. 4.2 Compatibility How to transfer information (e.g. meetings, courses) and real customer work (tutoring) tasks is important when building working teams with members from both Finland and Sweden (Hultman and Sobel, 2002). Swedish staff needed real work tasks to execute with the support of a tutor to improve their level of Baan knowledge and to get the required experience. Management prioritized the workload for Karlskoga staff executing real work tasks for Metso Karlstad. Team leaders stressed that work tasks were shared with Sweden and that Sweden got support from Finland when needed. There were no real work tasks from Partek, mainly because of low overall order level from Partek. Help from tutors in Finland was obtained, but Finnish consultants prioritised customer work over tutoring Swedish consultants. Still Finnish consultants acted as Change agents with bravura in the tools area. However, they felt they had to set their own customer jobs first. A new CC needed long-term help from the existing CC within the network. Support and acknowledgement by top-management to resource Swedish group with its most experienced Finnish Consultants in Sweden for a period of months would have been a clear signal to build trust and cooperation between CC: s. The Swedish CC faced simultaneous building and withdrawal and the Finnish CC were in the busy operation phase of their ERP CC lifecycle. The collaboration of expert support staff is the basis of authority when starting an ERP CC. There were initially no ready real Baan customers. There were no resources in Finland to help Swedish colleagues – because of the workload. Finnish management and hierarchic organization differ from the Swedish drive for consensus through discussions, meetings, and projects. It took months before the project leader was located in Sweden and before all staff in Sweden met their Finnish team members or the team leaders.

Enterprise resource planning 1557

K 34,9/10

1558

This kind of meeting is necessary when building cross-national teams. Effective distant tutoring concerned jobs programming in Baan Tools. Inner-training were done by tutors in Finland connecting to consultants PC with NetMeeting and IP-telephony. Application consulting tutoring is not possible from a distance. 4.3 Complexibility The essential IT support tool used by the Finnish CC is a groupware system called PODO (Perfect Office’s Daily Organizer) where the support errands are registered. Incoming calls are handled in an errand form which also acts as the interface between personnel, ERP and customers. The errand pops up at a communicator in regard to who has the right to serve it. The errand form shows various basic information are has what kind of customer has made the call and what communication channel has been used to contact the centre. It also shows the present time in the queue and the history of the errand is shown, so that the communicator can be prepared for a possible irritated customer that has been waiting for a long time or has been forwarded a number of times. In PODO documents like report and session layouts were attached and consultants got their next case to solve together with Finnish consultants. Swedish staff also visited Finland for courses, and to get jobs to work on at home. There are challenges with different cultures, languages, authority reports, etc. In PODO, the workgroup tool at TE, the majority of errands are registered in Finnish and are unreadable for Swedish staff. Finnish customers use the errand system and you cannot tell customers to write in English. The PODO system contributes to a more effective handling of errands but also an increased quality regarding the information according to the persons interviewed. Errands are logged and used as feedback. Yet, the savings related to this phenomenon could only be realised if the resources are really set free when these calls are routed among the CC’s. In general, regulation seems to be the point that has changed the most by the outsourcing of IT departments, since a number of new rules and policies are established to enable this working method. This in turn has resulted in an increase in efficiency, and quality in comparison with the in-house IT-department. A disadvantage is that users at the start will experience a longer reply time on ordering an IS/IT service. Increased availability and flexibility for the customer was also one of the introductory goals with the centres. Until new routines and new computers and applications are settled, users must wait. This is due to new internal routines for both client and a new IT partner both of which has responsibility for new routines. When customers receive a warm welcome and if they know what is going on behind “the curtains”, e.g. follow errands in the errand tools – they feel comfortable although they have to wait. 4.4 Triability The time to market implies that developing the skills internally is too time-consuming. Customers do not wait or pay for competence building. ERP experienced champions are needed. Lack of tutor help and the risk of being unsuccessful are harmful in processing the first tasks of new customers. Resourceful global competitors that aggressively protect territories are always a risk. To achieve large orders, the firms need to be able to act big, build an operative platform with tools, methods and skilled management. One Swedish finance consultant (employed 1 December 2001) with

previous experience in BaanIII was sent to a Nummi job in Ljungby, Sma˚land. The customer running BaanIV needed finance consulting in banking processes, authority reports, etc. The customer did not expect to participate in discussions or tests but just to get the job done. Different expectations on how to work led to the customer switching to IBM. Sending a novel consultant to a new application consulting job when customers expects quick solutions without their own participating is a great risk. This case study shows that a better way to start is by doing tools jobs like changing reports, sessions, menus with tutor help via net-meeting. Application consulting assumes an experienced consultant at the start and one with technical, social and business process skills. Facing physical problems such as passing firewalls, contact customers Baan Helpdesk Database at customers’ sites is frustrating. There are technical problems with opening firewalls for the four Swedish consultants in Va¨stera˚s and getting into clients different environments from the security point of view. There are also authority aspects (who decides, who to contact) when accessing a client’s development environment. Baan connection works well for Metso Karlstad’s environments. Karlskoga staff has super user rights and management databases of Metso Karlstad. On 23 January 2002, finally, there was a connection for test environment and production environment (information on company numbers) for Baan Metso’s and Partek’s environments. There was still no access to Partek Baan Helpdesk database. Only four novel Baan developers – three Baan consultants less than in the beginning of February 2002 – meant that there was no pure consultancy role for Swedish staff. To guarantee at least a reasonable knowledge of Baan as an application, a 2-day basic logistics course in Va¨stera˚s was arranged. 4.5 Observability Porter’s (1980, 1985) theories applied on ERP, e.g. ERP creates value operationalized on innovation theory with mutual adjustment between ERP and ERP service business. When handling the change in business organisation, service providers’ organisation, ERP life-cycle is the key. Lately ERP users act on achieving a promised lower degree of maintenance costs. The overview (own working) on ERP maintenance phases with new organization forms for support shows that professional roles change due to the recent marketed ERP service culture (Ng et al., 2003; Granebring and Re´vay, 2004). 5. ERP development 1990-2004 The historical development of ERP between 1990 and 2004 was the following. (I) ERP Introduction 1990 – IT department as cost center serving a decentralized business. Different configurated ERPs with teething troubles were bottom-up implemented in decentralised organisations. Strong division management caused suboptimation when each business unit was allowed its own ERP (Davenport, 2000, p. 19, Granebring and Re´vay, 2004). Violence on standard customization resulted in status-quo business change since old legacy systems were imitated. Free of charge in-house help led to escalating demands and considerable resources. The ad hoc support with no pressure to be effective, i.e. unstructured and informal specification and diagnosing of issues. Fire emergency call response to requests and problems. In this “family oriented atmosphere” the IT department – despite their immature ERP competence – dominated. The IT department had to make decisions – business

Enterprise resource planning 1559

K 34,9/10

1560

management had insufficient ERP knowledged. Responsibility and the powers did not agree. (II) ERP Growth 1999 – autonomous IT business function as Profit Center may find it uneconomical to cooperate and share resources corporately (Prahalad and Hamel, 1990). Off-customization ERP projects are legio (Granebring and Re´vay, 2004). Effectivness by built-in process view (best practice) through prioritizing and sharing. Firms install ERP in a technical, quick, no-risk way with cut and paste roll-outs. The profit centre strives for external customers. Formal communication with customer’s functional representatives with purchaser competence. Users specify errands. (III) ERP Maturity 2004 þ – outsourced IT function serve centralized businesses with standardized ERPs. The IT function monopoly status ends opening for intense fights for outsourcing contracts where ERP only is one part. Effectiveness of ERP CC rank larger and with more resourceful competitors. References from clients around the world are essential. Assistance to users and technical handling of the ERP is considered as a cost covered by fixed baseline amount. In negotiations, if an issue is basic support; it must be noted that slim budgets permit no extra hours. A template based approach is used with corporate business vision (ABB RASU, Baan Metso Templet) and with overall reuse focus. Standardization wing clips the ERP when functionality is blocked to suit top-down implementations with fewer, migrated, central sites with more users. A cost haunting business model services low-costs. Impersonal relationships with hierarchic decision ways are used. Tools remote control, allows users to have tools to solve their own problems. All users to have same PC, platforms. 6. Conclusions and future work This paper, about the potentials and entry barriers when establishing an ERP CC from an innovation theory perspective, argues that netting is the key. Network building takes time. Team-building activities – like Finnish consultants taking the mentor role for novel Swedish consultants is essential. ERP consulting skills is hard to coagulate down to small chunks of knowledge. Problem reporting databases are important warehouses of information. The rapid ERP lifecycle makes an ERP implementation consultant competence at half-time to be three years. Ownership of source code is important. The essential Baan ABC-licence was for the case-company but it was never accomplished. Customer demands increase the demands on actors supplying ERP service. ERP CC: s staff providing competence must conquer new technical, marketing and human platforms. Consultants continuously answering up to this are overwhelmed. From the start, ready existing long-term key customers, access to customer environments and certified staff are critical. Building customer relationships, ERP competence and a technical environment takes time. An ERP CC is not strictly limited to Baan related products but could cover other areas: integration with other ERPs, customer relation management (CRM), supply chain management (SCM), and business portals. Activate sales processes and consulting services that do not cause any confusion because of the CC’s different kind of solutions/products/competence. Fewer consultants are a risk presenting as a trustful ERP CC towards customer. A future article which gives the client’s perspective where the customer position gets stronger and customers take time to develop wiser ERP strategies, is planned.

Notes 1. Metso Paper is a Finnish paper manufacturer with business in Finland, Sweden (Karlstad and Sundsvall) and USA. 2. Partek (Ljungby, Kalmar, Hudiksvall) a manufacturer of trucks, etc. 3. RaSU ¼ Rent-a-SAP-User. A “cut & paste” company standard solutions with business template and software template. 4. Biztalk web site: www.microsoft.com/biztalk. 5. DEM ¼ Dynamic enterprise modeling (DEM), a framework to make the enterprise application aligned with the organizations’ changing processes and business model. Some of this is on power point level for marketing and not used in real business. DEM is part of Target Orgware, graphical tool and methodologies for Baan implementation. 6. Off-customization ¼ replacement where changes done to standard code are overwritten or reprogrammed. 7. Use of several communication channels; phone call, web, fax, or walk-up. References Alvesson, M. and Deetz, S. (2000), Kritisk samha¨llsvetenskapligmetod, Studentlitteratur, Lund. Avison, D., Baskerville, R. and Myers, M. (2001), “Controlling action research projects”, Information Technology & People, Vol. 14 No. 1, pp. 28-45. Bergvall, M. (1995), Systemfo¨rvaltning i praktiken – en kvalitativ studie avseende centrala begrepp, aktiviteter och ansvarsroller, Linko¨ping University, licentiatavhandling. Carlino, Nelson and Smith (2000), “AMR research study reveals SAP R/3 upgrade cost users 25 to 33 percent of initial investment”, AMR Research, available at: www.amrresearch.com/ pressroom/Files/ Chang, S-I. (2004), “ERP life cycle implementation, management and support; implication for practice and research”, paper presented at the 37th Hawaii International Conference on System Science. Davenport, T. (2000), Mission Critical: Realizing the Promise of Enterprise System, Harvard Business School Press, Boston, MA. Delic, K.A. and Hoellmer, B. (2000), “Knowledged-based support in help-desk environments”, IT Professional, pp. 44-8. Glass and Vessey (1999), “Enterprise resource planning systems: can they handle the enhancement most enterprise require?”, The software Practitioner, Vol. 9 No. 5, pp. 1-12. Granebring, A. and Re´vay, P. (2004), “Managing the logistic vs finance integration gap in ERP systems”, paper presented at the MicroCAD-2004 International Scientific Conference, University of Miskolc, Hungary. Granebring, A. and Re´vay, P. (2005), “On ERP migration strategy”, paper presented at the MicroCAD-2005 International Scientific Conference, University of Miskolc, Hungary. Gummesson, E. (2000), Qualitative Methods in Management Research, Sage, Thousand Oaks, CA. Hoch, Roeding, Purkert, Lindner and Mu¨ller (1999), Secrets of Software Success, Harvard Business School Press, Boston, MA. Holme, I.M. and Solvang, B.K. (1997), Forskningsmetodik – om kvalitativa och kvantitativa metoder, Studentlitteratur, Lund. Hultman, J. and Sobel, L. (2002), Mentorn – En Praktisk Va¨gledning, Bokfo¨rlaget Natur och Kultur, Sweden.

Enterprise resource planning 1561

K 34,9/10

1562

Kalling, T. (2003), ERP Systems: Strategic and Organisational Processes, Lund University, Lund. Kern, T., Willcocks, L. and van Heck, E. (2002), “The winner’s curse in IT outsourcing: strategies for avoiding relational trauma”, California Management Review, Vol. 44, pp. 47-69. Merriam, S.B. (1998), Fallstudien Som Forskningsmetod, Studentlitteratur, Lund. Ng, C., Gable, G. and Chan, T. (2002), “An ERP-client benefit-oriented maintenance taxonomy”, The Journal of System and Software, Vol. 64, pp. 87-109. Ng, C., Gable, G. and Chan, T. (2003), “An ERP maintenance model”, paper presented at the Hawaii International Conference on System Science. Patel, R. and Davidsson, B. (2003), Forskningsmetodikens Grunder – Att Planera, Genomfo¨ra Och Rapportera en Underso¨kning, Studentlitteratur, Lund. Perreault, Y. and Vlasic, T. (1998), Implementing Baan IV, QUE, Indianapolis, IN. Porter, M.E. (1980), Competitive strategy, Free Press, New York, NY. Porter, M.E. (1985), Competitive Advantage, Free Press, New York, NY. Prahalad, C.K. and Hamel, G. (1990), “The core competence of the corporation”, Harvard Business Review, May-June, Vol. 68, pp. 79-91. Rogers, E.M. (1995), The Diffusion of Innovation, 4th ed., Free Press, New York, NY. Ro¨nnba¨ck-O¨hrwall, A. (2002), “Interorganizational IT support for collaborative product development”, Doctoral dissertation, Linko¨ping University, Linko¨ping. Singer, J. (2002), Practice of Software Maintenance, National Research Council, Canada. Stratman, J. and Roth, A. (2002), “ERP competence constructs: two-stage multi-item scale development and validation”, Decision Sciences, Vol. 33 No. 4, pp. 501-628. Wedlund, T. (1997), “Att skapa en fo¨retagsanpassad systemutvecklingsmodell – genom rekonstruktion”, va¨ rdering och vidareutveckling i T50-bolag inom ABB, licentiatavhandling, Lidko¨pings universitet. Yin, R.K. (2003), Case Study Research: Design and Methods, 3rd ed., Sage, Thousand Oaks, CA. Further reading Bergvall, M. and Welander, T. (1996), Affa¨rsma¨ssig systemfo¨rvaltning, Studentlitteratur, Lund. Davenport, T. (1998), “Putting the enterprise in the enterprise system”, Harvard Business Review, Vol. 76, pp. 121-31. Litterer, J.A. (1973), The Analysis of an Organisation, Wiley, New York, NY.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Time and systems

Time and systems Robert Valle´e World Organisation of Systems and Cybernetics, Earley, Reading, UK

1563

Abstract Purpose – To give a mathematical expression of what could be called the internal time of a dynamical system, a time which is different from the external or reference time. Design/methodology/approach – The paper introduces a general mathematical definition of internal duration and so of internal time. Then we consider the case of an explosion followed by an implosion, which we apply to cosmology and physiology. The case of diffusion is also presented. Findings – The internal time is generally different from the reference time. In certain cases to a finite reference duration may correspond an infinite internal duration. Research limitations/implications – Our formulations may help to understand certain aspects of cosmology, physiology and more generally of the evolution of dynamical systems. Practical implications – For example, the physiology of ageing. Originality/value – The consideration of the square of the speed of evolution, at instant t, of a dynamical system for measuring the internal duration of interval ðt; t þ dtÞ is original, as well as its consequences. Keywords Cosmology, Thermal diffusion, Dynamics, Time measurement Paper type Research paper

1. Introduction Our purpose is to propose a definition of the internal time or intrinsic time (Valle´e, 1981, 1986, 1991), of a dynamical system evolving independently of any environment. The notion of internal time u is opposed to that of external time, or reference time t, taken for granted and which is used in the evolution equation. The basic idea is that the internal time does not elapse if the state of the system does not vary, a conception close to that of Aristotle for whom time ceases to be known when the “soul” does not change. So if y(t), belonging to RN (or CN), is the state of the system at reference instant t, any positive and increasing function, null for 0, of a norm of y(t)/dt, is a measure of the state of movement of the system at instant t. We make the most simple choice, that of the square of the Euclidian norm, represented by kdy(t)/dtk2. So we define the internal duration d(tl,t2) of interval (t1, t2), of reference duration t2 2 t1, by (Valle´e, 1996, 2001).

dðt1 ; t2 Þ ¼

Z

2

kdyðsÞ=dsk ds

ð1Þ

t 1 ;t 2

So if kdy(t)/dtk2 is equal to 0 on the interval, the internal duration is 0, and if kdy(t)/dtk2 is equal to 1, the internal duration is equal to the reference duration. In short, the higher the values of kdy(t)/dtk2 on the interval, the longer the internal duration. So kdy(t)/dtk2 represents the weight of reference instant t. This work is an invited paper of the 5th European Congress on Systems Science, Hersonissos, Greece, 16-19 October 2002.

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1563-1569 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614812

K 34,9/10

We can now define the internal time u(t) by Z 2 uðtÞ ¼ dðt0 ; tÞ ¼ kdyðsÞ=dsk ds;

ð2Þ

t 0 ;t

where t0 is any reference instant of the life of the system. Of course we have

1564

dðt 1 ; t 2 Þ ¼ uðt2 Þ 2 uðt 1 Þ;

ð3Þ

t1 , t2

2. Explosions and implosions We call explosion the evolution of a system whose modulus of the state vector starts with value 0 at t ¼ 0 then increases with t, and such that the modulus of its speed vector starts with value þ 1 at t ¼ 0: The first instants of the evolution of the system have an exceptional importance since the weight kdyðtÞ=dtk2 of instant t tends to þ 1 when t tends to 0 þ . We have here an idealisation as well as in the case of what we call implosion where k y(t)k decreases with t and attains value 0 at the final instant while kdyðtÞ=dtk tends to þ 1. A system may be explosive at the beginning and implosive at the end, in that case we say that we have an explosion-implosion. For the sake of simplicity we shall suppose now that y(t) is a mere scalar. We shall start with an explosion-implosion (Valle´e, 1996, 2001) defined by the differential equation dyðtÞ=dt ¼ q=p sgnð p 2 tÞðq 2 2 y 2 ðtÞÞ1=2 =yðtÞ;

yð0Þ ¼ 0;

p and q . 0; t 1 ½0; 2p;

ð4Þ

where sgnð p 2 tÞ is the sign of p 2 t: The solution of this equation is given by function yðtÞ ¼ q=p ð p 2 2 ð p 2 tÞ2 Þ1=2

ð5Þ

whose graph is an half-ellipse of great axis 2p and small axis 2q. We shall say that we have an elliptic explosion-implosion. When t varies from 0 to 2p, y(t) increases from 0 to q then decreases from q to 0, with a speed of infinite absolute value at 0 and 2p. The square of the speed of evolution is given by ðdyðtÞ=dtÞ2 ¼ q 2 =p 2 ð p 2 tÞ2 =tð2p 2 tÞ ¼ q 2 =2p ð1=t 2 2=p þ 1=2p 2 tÞ which shows that the weight of instant t is infinite at the beginning ðt ¼ 0Þ and at the end ðt ¼ 2pÞ of the life of the system. If we integrate ðdyðtÞ=dtÞ2 from t1 to t2 we obtain the internal duration of the reference time interval (t1, t2) dðt1 ; t 2 Þ ¼ q 2 =2p ðLogðt 2 =t 1 Þ 2 2ðt 2 2 t1 Þ=p 2 Logð2p 2 t 2 =2p 2 t 1 ÞÞ;

ð6Þ

and, remembering that an associated internal time u(t) is defined up to an additive constant, we can choose, for the sake of simplification,

uðtÞ ¼ dð p; tÞ 2 q 2 =p ¼ q 2 =2p ðLog t 2 2t=p 2 Logð2p 2 tÞÞ:

ð7Þ

We see that when the reference time t varies from 0 to 2p, generating a finite reference duration equal to 2p, the internal time u varies from 2 1 to þ 1 generating an infinite

internal duration. The initial instant 0 is pushed back to 2 1 and the final instant 2p is pushed forward to þ 1. The internal duration of any interval (0,t) is infinite as well as the internal duration of any interval (t, 2p). We consider now the differential equation dyðtÞ=dt ¼ q=p ðq 2 þ y 2 ðtÞÞ1=2 =yðtÞ;

yð0Þ ¼ 0;

p and q . 0;

t 1 ½0; þ1: ð8Þ

1565

Its solution is given by the function yðtÞ ¼ q=p ðð p þ tÞ2 2 p 2 Þ1=2

ð9Þ

whose graph is the right part of an half-hyperbola of great axis 2p and “small axis” 2q. We shall say that we have an hyperbolic explosion: y(t) increases from 0 to þ 1 with an infinite speed at instant 0 and, for the great values of t, y(t) behaves like q/p (t þ p). The square of the speed is given by ðdyðtÞ=dtÞ2 ¼ q 2 =p 2 ð p þ tÞ2 =tðt þ 2pÞ ¼ q 2 =2p ð1=t þ 2=p 2 1=2p þ tÞ: Calculations, similar to those of the elliptic case, give the internal duration of interval (t1, t2) and consequently, an internal time

uðtÞ ¼ q 2 =2p ðLog t þ 2t=p 2 Logð2p þ tÞÞ:

ð10Þ

When t varies from 0 to þ 1, u(t) varies from 21 to þ 1, the initial instant 0 being pushed back to 2 1. Any interval (0, t), of reference duration t, has an infinite internal duration. Moreover u(t) behaves as q 2/2p Log t for t small and as q 2/p 2 t for great values of t. An intermediary case, which we call parabolic explosion (Valle´e, 1996, 2001), is obtained when p and q tend to infinity while q 2/p keeps a constant value 2h. Starting indifferently from equation (4) or (8), we obtain dyðtÞ=dt ¼ 2h=yðtÞ;

Time and systems

yð0Þ ¼ 0;

h . 0:

ð11Þ

The solution is given by function yðtÞ ¼ 2ðhtÞ1=2 ;

ð12Þ

whose graph is a half-parabola of parameter h. We see that y(t) increases from 0 to þ 1 with an infinite initial speed. Then we have ðdyðtÞ=dtÞ2 ¼ h=t; the weight of instant t is infinite at the initial instant and tends to 0 when t tends to þ 1. The calculation of internal duration generates an internal time

uðtÞ ¼ h Log t;

ð13Þ

the initial instant 0 is pushed back to 2 1 and any interval (0,t) has an infinite internal duration u.

K 34,9/10

1566

3. Physiology We can interpret an elliptic explosion-implosion as the evolution of a living being whose birth may be compared to a kind of explosion and the end of life as an involution, or a kind of implosion, more or less quick. In our model the implosive part is symmetrical with the explosive one. This is not very realistic, since it seems that the implosive part must be shorter. Nevertheless if we consider only the qualitative aspect of the conclusions, we can conclude that, from the internal time point of view, the initial instant (conception) is pushed back to 2 1 and the final instant (death) is pushed forward to þ 1 (Valle´e, 1991). The first part of this qualitative conclusion is in accordance with the natural feeling of a human being (if we consider this case) of not having any beginning. The second part is more controversial, nevertheless it has been more or less considered, with approaches different from ours, by some authors (Le´vy, 1969; Le´vi, 1975). We can also consider the case of a parabolic explosion limited to the instant of death. The initial instant is pushed back to 2 1 and the internal time is proportional to the logarithm of the elapsed reference time. It elapses slowly at the beginning and more and more quickly near the end. This is close to the ideas of Lecomte du Nou¨y (1936). For him the physiological duration of an usual time interval, of given length, is proportional to the speed of healing of wounds. This speed varying roughly as the inverse of age, a logarithmic physiological time is generated. Here a remark seems necessary in order to avoid apparent paradoxes: the internal time of a conscious being may be different from the perceived internal time. 4. Cosmology We shall now interpret the notion of internal duration in cosmology. The cases of elliptic explosion-implosion, parabolic or hyperbolic explosion have common traits with cosmological models with primordial explosion followed by final implosion or with primordial explosion only. Generally speaking, the differential equation giving the evolution of the universe, whose state at instant t is described by the cosmological scale factor R(t), representing in certain cases the radius of the universe, is, according to Lemaıˆtre, Friedmann, Robertson (Berry, 1989). ðdRðtÞ=dtÞ2 ¼ 8pG=3 rðtÞR 2 ðtÞ 2 kc 2 þ L=3 R 2 ðtÞ;

Rð0Þ ¼ 0;

ð14Þ

where G is the gravitational constant, c the speed of light, k the index of curvature (k ¼ 21; space with negative curvature; k ¼ 0; flat space; k ¼ þ1; positive curvature, then R(t) may be considered as the radius of the universe), L the cosmic constant or cosmic repulsion term, r(t) the density of matter or its material equivalent in case of pure radiation. In the material case r(t) ¼ a/R 3(t) and in the case of pure radiation it is equal to b/R 4(t), a and b being constants. Equation (4), corresponding to an elliptic explosion-implosion, gives if we consider the square of the two members ðdyðtÞ=dtÞ2 ¼ q 4 =p 2 =y 2 ðtÞ 2 q 2 =p 2 : If we substitute R(t) to y(t), the above equation takes one of the possible forms of equation (14) if rðtÞ ¼ b=R 4 ðtÞ; k ¼ þ1; L ¼ 0: We have a case of pure radiation

with positive curvature and null cosmical constant. More precisely q ¼ cp and p ¼ 1=c 2 (8pG/3 b)1/2. Then ðdRðtÞ=dtÞ2 ¼ 8p G=3 b=R 2 ðtÞ 2 c 2 :

ð15Þ

The internal time of this system, which we propose to call generalized cosmological time (Valle´e, 1995, 1996, 2001) is then given, according to equation (7), by 2

uðtÞ ¼ c p=2 ðLog t 2 2t=p þ Logð2p 2 tÞÞ:

ð16Þ

The initial reference instant t ¼ 0 (big bang) is pushed back to 2 1 and the final reference instant t ¼ 2p (big crunch) is pushed forward to þ 1. But classically a cosmological model with pure radiation is accepted mainly as an approximation valid when the density of matter (a/R 3(t)) is negligible compared to the (equivalent) density of matter of pure radiation (b/R 4(t)). This happens when R(t) is small so when t is close to 0. In that case 2 kc 2 is negligible as well as L=3R 2 and so it is not even necessary to suppose that L ¼ 0: We then have ðdRðtÞ=dtÞ2 ¼ 8pG=3 b=R 2 ðtÞ:

ð17Þ

This equation is that of the radiation-dominated era, at the beginning of the universe, or that of an evolution with pure radiation, null cosmic constant and flat universe. It corresponds to a parabolic explosion whose differential equation, after taking the square of its two members, gives ðdyðtÞ=dtÞ2 ¼ 4h 2 =y 2 ðtÞ: We have just to substitute R to y and (2pG/3b)1/2 to h. According to equation (13), the internal time of this universe is

uðtÞ ¼ ð2pG=3bÞ1=2 Log t:

ð18Þ

The initial instant t ¼ 0 (big bang) is pushed back to 2 1. We recognize here what Milne (1948) has called cosmological time. Leaving the cosmological interpretation of our elliptic explosive-implosive or parabolic explosive systems, we shall apply the concept of internal duration and internal time to a universe of a rather different type. We consider the case of a flat space ðk ¼ 0Þ with no cosmic repulsion term ðL ¼ 0Þ and only matter ðrðtÞ ¼ a=R 3 ðtÞÞ; we have according to equation (14) ðdRðtÞ=dtÞ2 ¼ 8pG=3 a=RðtÞ;

Rð0Þ ¼ 0;

or after integration, RðtÞ ¼ ð8pG=3aÞ1=3 ð2=3Þ2=3 t 2=3 : The internal duration of interval (t1, t2) is then given by integration of (dR(t)/dt)2 dðt 1 ; t 2 Þ ¼ 3ð4pGaÞ2=3 ððt 2 Þ1=3 2 ðt1 Þ1=3 Þ

Time and systems

1567

K 34,9/10

1568

and so appears an internal time adapted to this universe

uðtÞ ¼ 3ð4pGaÞ2=3 t 1=3 :

ð19Þ

There is no push back to 2 1 of the initial instant t ¼ 0 and the life of this universe goes from 0 to þ 1 as well in terms of reference time t as in terms of internal time u. We must remark that, in all cases (cosmological or not) the push back, or non-push back, to 21 of the initial reference instant t ¼ 0 depends on the behaviour of the state near the initial instant. In a similar way the push forward to þ 1 of the reference final instant depends on the behaviour of the state near this final instant. More precisely, considering only the first case, it is easy to see that if the state behaves like t a near the initial instant ðt ¼ 0Þ the push back to 2 1 happens only if a 1 ]0, 1/2]. In the cosmological models considered above push back corresponded to a ¼ 1=2 and non-push back to a ¼ 2=3: 5. Diffusion of heat Vector y(t), representing the state of the system at reference instant t, does not belong necessarily to R N (or C N) as it is the case when y is a solution of a differential equation. It may be a function, defined at instant t, of point x of space R 3 for example. In other terms we may have, as a dynamical system, a space-time field y(t, x) satisfying a partial derivative equation. In this new case we must consider the square of the (Hermitian) norm of the partial derivative ›y(t, x)/›t considered as a function of x, that is to say Z 2 k›yðt; xÞ=›tk dx: ð20Þ R3

Its value, at instant t, is the weight of reference instant t. And so the internal duration of interval (t1, t2) is given by Z Z 2 dðt 1 ; t 2 Þ ¼ k›yðt; xÞ=›tk dx dt ð21Þ t 1 ;t 2

R3

and the internal time by

uðtÞ ¼ dðt 0 ; tÞ: We shall apply these definitions to a space-time field of temperatures, so to the diffusion of heat, considered, for the sake of simplicity, in the case of a space of dimension 1. Temperature at point x, at reference instant t is t u(t, x). If at the initial reference instant t ¼ 0; the repartition of temperature is given by the d distribution d(2) centred at x ¼ 0 (in other terms, with a rather abusive simplification, if the temperature is everywhere equal to zero except at x ¼ 0 where it is infinite) the repartition of temperature at instant t is classically given by uðt; xÞ ¼ ð4ptÞ1=2 expð2x 2 =4tÞ;

ð22Þ

which is a Laplace-Gauss function. When t tends to þ 1, this function of x flattens and tends to what we call epsilon distribution 1(x) (Valle´e, 1992). We have ð›uðt; xÞ=›tÞ2 ¼ 1=16p ð1 þ x 2 =2tÞ2 =t 3 expð2x 2 =2tÞ

Time and systems

and after some calculations Z ð›uðt; xÞ=›tÞ2 dx ¼ 3ð2pÞ1=2 =16 t 25=2 : R

So the internal duration of reference interval (t1, t2) is Z   23=2 23=2 3ð2pÞ1=2 =16 s 25=2 ds ¼ ð2pÞ1=2 =8 t 1 2 t2 dðt 1 ; t2 Þ ¼

1569

t 1 ;t 2

and an associated internal time is given by

uðtÞ ¼ 2ð2pÞ1=2 =8 t 23=2 :

ð23Þ

The initial reference instant t ¼ 0 is pushed back to 2 1. 6. Conclusion The results obtained about an internal time adapted to the evolution of a dynamical system depend on the hypotheses made. So their qualitative aspects are the most important. The anamorphosis which appears between the set of reference instants and the set of internal instants may push back to 2 1 the initial reference instant or push forward to þ 1 the final reference instant. The reference duration and the internal duration of the same interval may differ. These two qualitative results may be interpreted, as we have seen, in physiology, cosmology and field theory. References Berry, M. (1989), Principles of Cosmology and Gravitation, Institute of Physics Publishing, Bristol and Philadelphia. Lecomte du Nou¨y, P. (1936), Le Temps et la Vie, Gallimard, Paris. Le´vi, R. (1975), L’En-dec¸a de la Mort, Vrin, Paris. Le´vy, J-C. (1969), Le Temps Psychologique, Dunod, Paris. Milne, E.A. (1948), Kinematic Relativity, Clarendon Press, Oxford. Valle´e, R. (1981), “Memorization in systems theory and perception of time”, in Lasker, G.E. (Ed.), Applied Systems and Systems Research, Pergamon Press, New York, NY, Vol. 2, pp. 697-700. Valle´e, R. (1986), “Subjective perception of time and systems”, in Trappl, R. (Ed.), Cybernetics and Systems ’86, D. Reidel Publishing Company, Dordrecht, pp. 35-8. Valle´e, R. (1991), “Perception, memorisation and multidimensional time”, Kybernetes, Vol. 20 No. 6, pp. 15-28. Valle´e, R. (1992), “The epsilon-distribution or the antithesis of Dirac’s delta”, in Trappl, R. (Ed.), Cybernetics and Systems Research, World Scientific, Singapore, pp. 97-102. Valle´e, R. (1995), Cognition et Syste`me. Essai d’e´piste´mo-praxe´ologie, L’Interdisciplinaire, Lyon-Limonest. Valle´e, R. (1996), “Temps propre d’un syste`me dynamique, cas d’un syste`me explosif-implosif”, in Pessa, E. and Penna, M.P. (Eds), Actes du 3`eme Congre`s International de Syste´mique, Edizioni Kappa, Rome, pp. 967-70. Valle´e, R. (2001), “Time and dynamical systems”, Systems Science, Vol. 27, pp. 97-100.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 34,9/10

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Systemic philosophy and the philosophy of social science Part II: the systemic position

1570

Jon-Arild Johannessen Oslo School of Management and Bodø Graduate School of Business, Lillesand, Norway, and

Johan Olaisen Norwegian School of Management, Lillesand, Norway Abstract Purpose – To discuss systemic thinking in relation to the naturalistic position in the philosophy of social science. To develop the theme in two parts: Part I: systemic thinking and the naturalistic position; and Part II: the systemic position. Design/methodology/approach – A cybernetic approach is taken, and a discussion on what is the foundation for the philosophy of social science for systemic thinking and the systemic position is developed. Findings – The findings of Part I have been given. Part II analyses the systemic position and considers the classical controversy in social science between methodological individualism and methodological collectivism (holism). The pre-condition on which the systemic position is based is given. The ideal requirements set up by the systemic position are presented under the headings: espistemology/methodology; ontology; axiology; and the ethical position. Practical implications – Provided assistance to social scientists who study social systems from the systemic or cybernetic viewpoint and give a practical analysis of the systemic position. Provides researchers and others working in this field with an investigation of the role and conduct of social scientists. Originality/value – It positioned systemic thinking in relation to the philosophy of social science. Keywords Cybernetics, Philosophy, Sociology Paper type Viewpoint

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1570-1586 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614821

Introduction The systemic position makes a distinction among the epistemological sphere (Bunge, 1985), the ontological sphere (Bunge, 1983a), the axiological sphere (Bunge, 1989, 1996) and the ethical sphere (Bunge, 1989). Examples from the epistemological sphere are: system, truth, knowledge, meaning theory, model, hypothesis, and causality. The epistemological sphere is in turn divided in the logical sphere, where we, among other things, investigate constructs, the semantic sphere, where meaning, sense and reference are investigated; the methodological sphere, where the connection between facts, data, constructs interpretation and testability are investigated. Examples of the logical sphere are utility and rationality. Here the basis for rationality is investigated, among other things. Examples of the ontological sphere are actions, events, process, and artefacts. In the ontological sphere the nature of society is investigated.

“Axiology, or value theory, is the branch of philosophy that deals with the general concept of value and with the status of value judgements” (Bunge, 1996, p. 220). In the axiological sphere, the role of the observer observing a system, among other things, is investigated. Examples form the ethical sphere are: measure of social equality, freedom, wishes, needs, norms, moral codes, analysis of the link between ends and means, the context of solution (Johannessen, 1997b). Here the role of conduct of social scientists is investigated. Systemic thinking expresses explicitly that: “genuine freedom for all can only be attained together with good measure of social equality” (Bunge, 1996, p. 536). Epistemology The ultimate goal in all social science, viewed in a systemic perspective, is to find or uncover patterns conducive to explanations and “possibly also predict social facts” (Bunge, 1985, p. 157). But most social patterns are local, in the sense that they appear exclusively in societies of a special type. Universal and cross-cultural patterns are however to be found, along with the local ones. Whether social science contains statements on law, to a great extent depends on whether a restrictive or wide interpretation of social behaviour is used. What is totally uncontroversial is the fact that social science has brought forth a number of generalisations, and some of these are here regarded, according to a wide interpretation of the law concept, as law statements. E.g. “The law of deviating amplifying” (Maruyama, 1963), “The darkness principle” (Ashby, 1961), “The complementarity law” (Weinberg, 1975), “Redundancy of information theorem” (Shannon and Weaver, 1949), “Circular causality principle” (Ashby, 1961), “Feedback dominance theorem” (Beer, 1979), “Requisite variety law” (Ashby, 1961). A pattern is variables which are stable over a certain period of time. A social law is then created by an observer acquiring insight into this pattern. A social law is a pattern of a very special type: they are systemic, i.e. they are part of a knowledge field, and are not changed unless phenomena which they satisfy or obey, are changed into phenomena of an entirely different type, i.e. that the phenomenon undergoes a metamorphosis. By acquiring insight into a pattern of social behaviour, we can predict elements of social behaviour, at least roughly and in a short-term perspective. Social patterns are further linked to specific societies in time and space, but this also to a considerable extent applies to laws of nature, even if these have a longer time perspective and are of a more general nature than social “laws”. If we make a distinction in data between intention and behaviour in social science, the systemic approach regards the intention categories as having to be understood, whereas the behaviour categories can be explained. This distinction can be expressed in the following way: lifting the head is different from the head being lifted. Lifting the head is linked to intention, and that the head is lifted is linked to behaviour. By the distinction between intention and behaviour, the dualism between naturalism and anti-naturalism is transcended. Both angles of incidence become viable and complement each other in the study of social systems. The intention can further be linked to our dispositions to think and act, i.e. hidden knowledge. In order to understand an intention, we must study historical factors, the situation and the context, in addition to expectation mechanisms. Behaviour must be explained in

Systemic philosophy

1571

K 34,9/10

1572

the context and situation in which it unfolds, in addition to the context of which it is part. What implication has then the distinction between intention and behaviour for the study of social systems? (1) The interpretation of meaning becomes an important part of the intention side in the distinction. (2) Explanation and predication becomes an important part of the behaviour side in the distinction. It is the link between interpretation and meaning and explanation and predication which gives social science practical strength. This link is constituted by, among other things, the development of constructs, e.g. concepts. The concepts in social science are not purely a representation of social systems, but instrumental in constituting social systems. With the distinction between intention and behaviour, social science become both interpreting and also an explanatory enterprise. For the anti-naturalistic or the humanistic school, there is neither room for, nor a need for theories in social science, as everything is about interpretation, and it is the inter-subjectivistic element, which is meant to compensate for social theories. But they can neither explain why, e.g. social changes appear and why various actions are performed. In order to explain those kind of phenomena, we need theories (i.e. systems of propositions). For the intention category we are not going to look for explanations, and neither social theories nor laws, but to try to understand and interpret the meaning. In the systemic research model, the mental (emic) does not precede the behavioural (etic), but constitute different knowledge domains to be studied, together or separately. Sometimes the one may be the case of the other, and, at other times, vice versa. Constructs from both domains are used on the condition that workable indicators can be developed. Further, it should be noted that according to the systemic approach, all adequate explanations in social science are pluralistic, i.e. they are related to the model of the human being and the social systems we use, and it is therefore only partial truths (Bunge, 1983b) we are introduced to. Much of the existing confusion in social science emanates according to systemic thinking, from a lack of distinction between intention and behaviour. What is then the problem if naturalists take care of theory development and look for social laws, while anti-naturalists look for interpretation of meaning? The problem is that “the naturalist spends a great deal of time talking about scientific theories, but they analyse scientific theories in general, and give little attention to the scientific problems of social theories” (Fay and Moon, 1994, p. 28). The problem in social science is that actors subject to study both have intentions, and display behaviour. But this is not necessarily a problem if we make the distinction between intention and behaviour. The anti-naturalistic school, here represented by Collingwood (1945), among others, argue that social behaviour can be explained through reconstruction of the actors understanding of themselves. But this angle of incidence does not make the distinction mentioned, and the intention and behaviour then easily become congruent entities. The underlying premise in such an angle of incidence is that the actors are rational individuals, with certain intentions, which are translated into practice through their behaviour. But the actors’ self-understanding in regard to intentions, motives and

notions or an observers interpretation of them can vary greatly from the actions performed. This means that there is no necessary one-to-one link between intention and behaviour. The crusades in the middle ages could be an example of this type. To seek an explanation for the crusades by studying intentions and motives, can easily lead to fallacies, since the self-understanding of the actors involved was incorrect. That is it is likely that there were other motives providing incentives for the actors to join the crusades, than the motives they may have expressed to an observer, even if they themselves believed in their motives when they were articulated. But we cannot resolve the conflict by means of the naturalist angle of incidence alone, since it is only concerned about behaviour and not the thought and action dispositions forming the basis for the behaviour. One way of providing deeper insight into social phenomena/problems, is in fact to regard the two schools as complementary. By at the same time, introducing the distinction between intention and behaviour we have made it possible to uncover partial instincts for each of the representatives for the two main schools, and to synthesise this new insight into new knowledge. This understanding is clearly expressed by Fay and Moon (1994). Methodology The goal of theoretical research, according to the systemic position, is the construction of systems, i.e. theories. When a system exists, we can split it up for analytical purposes, e.g. testing hypotheses against a practical problem. That is first the fish, then the filets, and not vice versa. The order in systemic research is then: theory, analysis and then synthesis. In the methodical sphere, the systemic position has maintained its main focus on inter-connections, both in terms of concrete things, ideas and knowledge, and therefore encourage interdisciplinary and multidisciplinary approaches to problems/phenomena. The conceptions held by a neutral observer on social systems would influence his acts, even if his conceptions are wrong or true. Systemic investigations therefore start “from individuals embedded in a society that pre-exists them and watch how their actions affect society and alter it” (Bunge, 1996, p. 241). The study of social systems from a systemic perspective for this reason always include the triad: actors, observers, social systems. The observer tries to disclose the objective composition, environment and structure (CES) of a social system, then the subjective notion the actors have of CES. Furthermore, we are interested in the mental models actors have of the social system, and the mental models we as observers have of the same system. It is then both subjective and objective aspects that need to be studied. When studying changes in the social system, which is the subject matter of cybernetics, we must from a systemic point of view investigate the social mechanisms influencing the changes. It is the internal and external social mechanisms that need to be disclosed within the political, economical, the cultural and the social partial system, in addition to the relations between the partial systems. Social changes can emerge in all four partial systems, and none of the partial systems take precedence in the survey of social changes. The first decision a researcher must take is to determine what is to be studied, i.e. the unit of analysis (i.e. individual, group, organisation, society). But any analysis is part of/or embedded in a larger system. Therefore, it is important in systemic thinking always to see the unit of analysis in the light of a larger system which it is part of, in order to understand the function, role, etc. it has in the larger system. Then it must be

Systemic philosophy

1573

K 34,9/10

1574

investigated how the unit of analysis is embedded in the system level underneath, in order to understand which function, role, etc. the analysis has in relation to this system. We have further assumed that a social system is constituted by four partial systems, the economic, political, cultural and the social. The procedure will then be to see the unit of analysis in relation to these four partial systems. We have further disclosed four causal connections in the social system (Johannessen, 1997a), historical, functional, cybernetic, and patterns. The procedure in systemic thinking then will be to study causal connections operative in relation to the problem/phenomenon to be analysed in the unit of analysis. The analysis form in systemic thinking is: (1) The unit of analysis is investigated in relation to the macro level. (2) The unit of analysis is investigated in relation to the micro level. (3) The unit of analysis is investigated in relation to the four partial systems. (4) The unit of analysis is investigated in relation to the causal processes. Ontology The systemic approach is based on a system-theoretical ontology, where the world is seen as a system consisting of subsystems, and an epistemology combining realism and rationalism. The aim of the systemic approach is to understand, predict, and control. The methods include analysis as well as synthesis, generalisation and systematisation. Emergents is when something new emerges which previously did not exist at a lower system level. The thought about emergents conflicts with radical reductionism and mechanical thinking. The concept emergence is often linked to an unclear concept definition. Emergence are also the crux of the matter in the relation between micro and macro processes in sociological theory development (Turner, 1991). By emergent we meant. Let S be a system with composition A, i.e. the various components in addition to the way they are composed. If P is a property of S, P is emergent with regard to A, if and only if no components in A possess P; otherwise P is to be regarded as a resulting property with regard to A (Bunge, 1977, p. 97). Systemic thinking is based on the premise that society is a concrete system of interrelated individuals, and that some properties are aggregates of individual properties, while others are “global” and emerge as a result of relations between the individuals. The emergent properties must be studied at different levels in a system, and the relations between the levels must also be studied. In social science we often use the term system. What does a social system mean in systemic thinking? A system can be conceptual or abstract. Theories and analytical models are examples of conceptual systems. Structure in a social system: “is the collection of relations among the parts of the system as well as between these and the environmental items” (Bunge, 1996, p. XX). Structure is by this understanding an emergent properties of a social system, and emergence is an ontological category (Bunge, 1996, p. 20). Social systems consist of four partial systems (Bunge, 1996, 1998), which interact and are mutually dependent on each other, and which through their interaction create an emergent entity, namely the social system as a whole. These partial systems are: The cultural, which has values as its mode of expression. The social, which has social

relations as its mode of expression. The political, which has power as its mode of expression. The economic, which has material resources as its mode of expression. Social systems are further “composed by artefacts” (Bunge, 1996, p. 21). Social systems are kept together (in systemic thinking) by dynamic social relations (e.g. emotions, notions, norms) and social actions (e.g. co-operation, solidarity, conflict, exchange, communication). It is none of the social actions that take precedence in systemic thinking about social systems like, e.g. conflict by the Marxists and solidarity with Durkheim. While individuals are concrete entities, organisations and social systems are constructs, i.e. social systems are emergents (Bunge, 1996, p. 45). Ontological problems always have an epistemological counterpart. E.g. the question: are there social facts, one has an epistemological counterpart in the question: Are social facts constructions in the mind of the observer? Axiology Axiology is Greek from axios, valuable and logos, doctrine. Another word for axiology is value philosophy, i.e. the school in philosophy that examines the common ground for various forms of evaluations, e.g. technical, judicial, esthetical, and moral assessments. It is the conceptual pairing good/bad which permeates axiology, e.g. good/bad works of art, actions, etc. The axiology has its origin in Greek philosophy (Plato). For some there is no distinction between ethics and axiology, e.g the utilitarians who think that the axiology is a part of ethics. Whether social science should be value neutral, has, among other things, been discussed by Max Weber, who was active in his pursuit of value-neutrality in social science. Myrdal (1944, 1969) on the other hand, argued vehemently in favour of a social scientist’s obligation to make his own values explicit and argue in favour of them. Axiology has its origin already in Greek philosophy (Plato). For some there is no distinction between ethics and axiology, e.g. the utilitarians who think that axiology is a part of ethics. Whether social science should be neutral with regard to values, as, has among other things, been discussed by Max Weber, who argued strongly in favour of value neutrality in social science. Myrdal (1944, 1969) on the other hand, argued strongly in favour of the notion that a social researcher should make his own values explicit and give reasons for them. Axiology, among other things, deals with the question discussed by Weber and Myrdal: what is the role of values in social science? Axiological statements follow the pattern: . X is good (or bad), or X is better (our worse) than Y. More concrete the axiological statements can be expressed like: . X is good (or bad) for person A (or social system B), in context C and situation D, with regard to aim E (Bunge, 1996, p. 219). What is good (or bad) can be measured in relation to A (B), C, D and E. Understood in this way good/bad is what constitutes a relational expression, i.e. X has a value for some in some situations and for a purpose. This means that: “there are no values in themselves” (Bunge, 1996, p. 220). “The axiology of social science investigates the roles of values and social science. It asks: Can social science stay clear of value? What kind of value is more relevant to social science, objective, subjective or both?” (Bunge, 1996, p. 8).

Systemic philosophy

1575

K 34,9/10

1576

The following are some examples of objective and subjective axiological statements: (1) Objective (testable) axiological statements: . Weapon race is bad for the whole economy. . Birth control is good for the economy. . Democracy is good for the integration of a society. . Income distribution is good for the integration of a society. . Inflation is bad for everyone. (2) Subjective axiological statements: . Suppression is socially unacceptable. . Greed promotes abuse of power. . Art is good for a society. The main postulate in systemic axiology, is that all humans strive for a state of well-being (varying with the person). A conclusion from this is that when a person attaches positive importance to something, he will normally try to make the value real for himself and/or others, and if he attaches negative importance to something, he will normally try to avoid what is associated with this value. Systemic axiology is based on the following suppositions: . Human beings regard everything that can enhance their sense of well-being as positive, and everything that can diminish their well-being as negative. . It is the needs and the desires to achieve a sense of well-being which constitute each individuals value system. . Values are expressed by means of each individuals moral codes. . The actions are performed to fulfill needs and/or desires, and thus impact our sense of well-being. “Axiology is centrally concerned with the good, ethics with the right” (Bunge, 1989, p. 5), and for the systemic position the good is prior to the right (Bunge, 1989, p. 6). This is in contrast with the Kantians, who define the good in terms of what is regarded as right. For the systemic position axiology becomes a basis for ethics, in the same way as ethics is a basis for the moral codes. The moral codes in turn influence choices being made, based on certain preferences. The choices are indicators for things, systems or processes, i.e. the indicated. Preferences are in turn based on various degrees of security, ambiguity and ignorance in relation to the indicated (Figure 1). Axiology is not a clearly defined philosophical discipline, but is characterised by the characteristics of the multi-discipline, i.e. it borrows knowledge from other disciplines, e.g. psychology, sociology, epistemology. The axiology has as its defined goal to be of help to ethics and economics, in particular. It is the objective needs and legitimate subjective wishes which constitute the basis for systemic axiology. Both the person level and the social level can form the basis and aim for needs and wishes in systemic axiology. The value system focuses on what is good (well-being) for the individual (or a group of persons). Well-being is directly linked to the fulfillment of basic needs

Systemic philosophy

1577

Figure 1. Axiology and ethics

(biological, psychological and social) as an interactive system entity. A presumption included here is that the more active and useful people perceive themselves as in a social system, the greater their well-being. By useful is here meant the state of being of help to others. With this presumption, human well-being is linked to the mutual fulfillment of personal and other biological, psychological and social needs/wishes. In nature, which we are part of, excess is as harmful as shortage. It is therefore fair to assume that the needs and wants will vary within an upper and a lower limit. We refer to this as cylindrical needs/wants, analogous to Ashby’s (1981) concept cylindrical. The assumption here is that all needs and wants are cylindrical, i.e. within determined marginal values, and represent sources of positive values (biological, psychological, social). We further suppose that needs and wants which are not cylindrical are sources of negative values (biological, psychological, social). What is the upper and lower limit for needs and wants, will naturally vary according to culture, society and various stages in individual and social development. This does, however, only mean that every society must make explicit what is accepted as the upper and lower limits. It may seem as if there is more acceptance for establishing lower boundaries than upper marginal values, which means that a cylindrical state in reality will not be operating in the social system. Upper marginal values should however be established where the needs and wants of the individuals are detrimental to themselves and others. If a need or a wish can only be fulfilled by other persons or ecological systems are harmed, this is regarded as a negative value for society.

K 34,9/10

1578

To translate the whole value system into a question of trading values and the price of certain goods and services, is not only a reductionistic procedure, but even more to lose sight of the fact that various value sources are distinct and incomparable. Putting down needs and wishes as the basis for well-being, as well as the constitution of the value system, as opposed to, e.g. supply and demand, is done for the very purpose of avoiding logical fallacies easily committed in relation to a value analysis. While biological and psychological values are linked to each individual, social values are linked to both the individual and social systems. This is what makes social values more complex than biological and psychological values. The starting point in terms of social values is that social values conducive to the integrity of social systems, are useful. Integrity is here understood as everything that in its consequence furthers the viability and survival of the system. It has to be underlined that what is socially useful is not necessarily linked to positive values for all members of a social system. E.g. it is regarded as useful for society to prohibit the sale of tobacco and alcohol to school children, but the sale could have tremendous value for the seller. Even if the value system is based on the well-being of the individual, and based on biological, psychological and social needs and wishes, and thus a subjective entity, it does not follow that the value system cannot be studied scientifically. There are no values according to our presumptions without a subject which can evaluate what well being is, and thus perform the necessary actions to fulfill desires and certain needs (the biological basis of the value system). The value system is not an absolute phenomenon for the individual, but subject to change in relation to internal and external conditions with the one who evaluates his needs and wishes (the relativistic basis of the value system). The value system reflects the society in which the subject exists (the social basis of the value system). Some values are cross-cultural and exist in all human societies (the common basis of the value system). It is thus both biological, mental, material, social (economic, cultural, and political) entities which serve as proponents of the values for the individual. This means that, e.g. material and social entities are not the same as well-being, but embody certain physical and social consequences regarded by the individual as conducive to well-being. Here we have expressed that well-being is a relativistic concept. Some values are subjective in the sense that they can be evaluated by no others than the ones who experience them. Other values are objective in the sense that they can be evaluated by others. E.g. “a type of positive well-being is pride and a negative type of well-being is shame and guilt”. An example of a positive objective sense of well-being is clean air and water. An example of a negative type of well being is the lack of food and water. The values can thus be positive and negative, in addition to being linked to social conditions and mental/psychological states. One major purpose of making the distinction between biological/psychological and social values is the comparison of distinctive values. It should be emphasised that even if the values in the biological/psychological and social domains are mutually dependent, they are distinctive, i.e. they cannot be compared to each other. E.g. a cultural experience can naturally lead to a mental state of well-being, but a cultural experience is not for that reason a mental state. A cultural experience and a mental state are distinctive domains. An ocean area (the Bird Island Fjord in Vigdel Gildeska˚l)

can likewise be evaluated on the basis of the existing resources, but also on aesthetic values. Neither of the entities can be reduced to the other, i.e. the same thing can be evaluated on the basis of a variety of values, and cannot be reduced to a common entity, e.g. money. In this way we have said that the biological/psychological and social values can be seen in relation to each other, but they cannot be compared and reduced in a hierarchy of values where, e.g. the economic basis for the fish in the Bird Island Fjord is ranked higher in the value hierarchy than, e.g. the aesthetic experience of the Bird Island Fjord and a poem emanating from this sense of well-being. They are distinctive values that cannot be subsumed in a hierarchy. The value system can thus not be reduced to a simple (or complex) value hierarchy, ranked and evaluated on the basis of a value function. The value system is a system, not a hierarchy, and what characterizes a system is the existence of relations among elements. The values are the properties existing in those relations. Values are not things or properties pertaining to things, but properties attached to things by us. The definition expresses that mental states, physical environments, physical needs, economic, political and cultural entities are not values, but carriers of values. By linking values to needs and wishes, the values can be studied scientifically. The presumption is that optimum human well-being is achieved through a balance among biological, psychological and social values. The axiological norm to be deduced from this presumption, is that to realize human dignity, biological, psychological and social values must be realized. Human values are values by virtue of the system relations they are part of. Clarification of instrumental values analysis (means-ends) is therefore important, as means and ends must be consistent and subject to value analysis. The consistency principle is based on the idea that humans develop habits, and if means and ends are not subject to continuous value analysis, goals will easily be shifted, as the means used do not correspond to the values we attach to the goals. This means that values which are related to our goals must be consistent with the means used by us to reach these goals. The end justifies the means, is therefore a statement of no interest in such a value theory. It is, on the contrary, the values related to the means, which should apply to the use of means. The means-ends relation is no dichotomy, for the simple reason that what is a means at stage 1 in most cases would be a goal in stage 2. Means and ends will continually alternate between being means and ends in a longer time sequence. Therefore it could be downright harmful to separate the relation between means-ends, and turn it into a dichotomy. That the inability to make a sharp distinction between ends and means can be unpleasant, and is another matter. When we are not in a position to make such a distinction, we are forced to base our use of means on the same evaluations, as on our goals. Ends and means are according to this understanding mutually dependent entities, in a circular causal link, and can therefore not be seen a one leading to the other, only with a time separation in-between. What in some situations are means, will in other situations be ends and vice versa. A scientific value theory must in its consequence be relative in relation to the historical situation of the evaluating subject, i.e. some values are local, while others are universal. E.g. peace can be thought of as a universal value, but the oppressed in dictatorships will not agree to such an absolute value. For a lot of them peace in the

Systemic philosophy

1579

K 34,9/10

1580

historical situation in which they exist is tantamount to a relatively unhappy existence, an unpleasant life, or discomfort. Respect, responsibility and solidarity, on the other hand, must be regarded as a universal entity (Polanyi, 1957; Polanyi-Levitt, 1990; Mendel and Sale´e, 1991; Baum and Ellsberg, 1989; Baum, 1996). This leads to the following supposition: absolute values must be based on persons who at any time are most badly off in a social system. This presumption is in conflict with Pareto-optimality (Barro and Martin, 1995) and Rawls’ (1971) value theory. Neither Pareto nor Rawls use the most badly off as their basis in their value considerations, they only state that the latter are not worse off when the former are better off. Our basis is the opposite. The needs and the legitimate wishes of those most badly off is the foundation, regardless of whether relative or absolute, are fulfilled at the expense of a majority who are better off. This is of course historically and culturally relative in regard to what groups are at all times in a position where their dignity and well-being is at a lower level than that of the others, both globally, nationally, and locally. This is both relatively, idealistically, and ideologically. But if a value analysis should not take these three entities into consideration, it would not be a value theory, but a legitimacy theory, which would border on science as camouflage for power structures. In a value context, some values can be objective, while others can be described as subjective, even if most values are based on a relation between a subject and “something”. Something is objective if it exists independent of other knowing subjects. Starvation is, e.g. an objective entity, since it can be analysed by means of biological indicators. Values can further be personal or collective (local or global). An example of a subjective value is taste. An example of an objective value could be in the format: If X fulfills a primary need for Knut, X will have an objective value for Knut, even if Knut does not personally want X. Value statements are a cognitive operation referring back to the person making the statement, i.e. it is self-referring, but this does not prevent value statements from being tested empirically. The following value statements can, e.g. be tested empirically: the more people living below the poverty line in a country, the greater the probability of violence in society. Or the statement: the more uneven distribution of material resources in a country, the greater is the probability for violence in society. If a value is subjective, this does not mean that it cannot be tested scientifically, by means of scientific methods, and thus be tested by others. Some statements can be linked to biological needs or wants, others to psychological needs or wishes, and some to social needs or wants. Irrespective of which of these three sources the values are engrained in, the causal processes linked to subjective statements can be tested for their verity content. When subjective statements are to be tested (and objective ones too, for that matter), then a negative test in Poppers sense (falsification) will give an indication, but not a complete satisfactory indication. We also need to link the test to an empirically positive test, not only to what we know that we do not know (negative test). Having both a positive and a negative test on a subjective statement, in accordance with the statement, we have adequate knowledge to predicate actions, even if we cannot specify the underlying mechanisms that make them possible. In this way we can test subjective value statements scientifically against the empirical data.

Science need not overlook values, and if it does, it also overlooks a substantial part of its basis: to seek the truth, which could no doubt be an exemplary case of a general value statement (Figure 2). A value analysis can be concretised by asking two questions in relation to goal realisation for subjective wishes and objective needs for the individuals. The two questions are linked to the degree of desirability and intensity in the act itself, i.e. the act to reach the target. . Question 1: To what extent is goal realisation desired (answer alternatives, e.g. 1-10: not desirable – very desirable). . Question 2: With what intensity are you prepared to pursue the goal? (Answer alternatives, e.g. 1-10 no intensity – great intensity).

Systemic philosophy

1581

Desirability and intensity are meant to express something about value satisfaction for a need and/or wish to pursue actions carried out to reach certain targets. By using a degree of intensity in actions and a degree of wanted goal realisation, the moral component is incorporated. This means that the individual actor makes use of the consequences of the action for others when intensity and desirability are used as value parameters. Ethics Ethics is here defined as the study of morals, and “moral code” is a system of moral norms, or rules (Bunge, 1996, p. 226). Ethics according to this conception, relates to morals as history science relates to history. In every social system somebody needs help from someone during the course of their life in order to fulfill their needs and wishes. The ethical system has this

Figure 2. Systemic axiology: value analysis

K 34,9/10

1582

presumption as its basis. Mutual help is a moral principle which includes others and oneself. Whereas values are linked to our own needs and wishes in the first instance, ethics is linked to helping others in fulfilling their needs and wishes in the first instance, and in the last instance the help is mutual and can thus be based on self-interest, but does not have to be. If, it is definitely not based on self-interest as a principle, since we do not help others in order to, in the last instance, to be helped ourselves. History is full of examples of this statement. While needs and wishes are linked to something in each individual, then rights and obligations are linked to something outside the individual. A viable society is based on the balance between self-help and help extended to others; otherwise the society would not have existed. A major point of a society, regardless of type, is the collective feeling of developing something. Therefore rights and obligations are factors holding the society together. It is a principle of balance between rights and obligations which is here regarded as an ethical assumption for a viable society. By viable society is here meant a society in a process of continuous integration and disintegration, where the integration in a time perspective must always take precedence in order for society to exist, even if disintegration as a leading process can be both desirable and necessary in certain periods of time. If a society attaches more importance to each individual’s needs and wishes than his rights and duties, a disintegration process, in its consequence socially destructive, will be initiated. This also applies if rights and obligations are more heavily emphasised than each individuals needs and wishes. This basic principle of balance is here seen as a moral axiom, for viable social systems. Moral is here linked to each individuals well being, while ethics is here linked to the viability of a society. But no individual can achieve a sense of well being without the company of others. In this way ethics and morality become mutually independent entities. A principle of balance for the morally correct act, can be expressed in the following way: a morally correct act is to seek personal well-being, while directly and indirectly helping the one/the ones who have their needs and wishes fulfilled to the least extent in the social system. The principle is based on the three magnitudes: respect, responsibility and human dignity (Benhabib, 1992). This principle of balance goes against the assumption that action strategies should be oriented towards the achievement of as much happiness, well-being, welfare, etc. as possible for as many persons as possible. One consequence of the moral principle of balance is that it will be morally wrong to act in order to maximise self-interest, if one does not belong to those who, in the social system, are among the least successful in fulfilling their needs and wishes. Another conclusion from the moral principle of balance is that to act in a morally neutral fashion, i.e. not to pay attention to the principle of balance in ones action strategies, would be to act incorrectly, morally speaking. Another consequence is that it is not morally incorrect to act in order to increase personal well being, if this simultaneously means acting in the interest of the least privileged. There are degrees of what is morally incorrect, according to the moral principle of balance. Each individual actor does, however, have a moral obligation to argue in favor of the contention that the acts are morally sound. A morally sound act, according to the moral balance principle, is oriented towards the consequences of the act (or lack of act), not intentions. Good intentions can bring

about negative consequences for those intended to benefit from the balance principle, and bad intentions can in their consequence lead to positive results for the ones who are the least privileged. Both intention categories will have an inherent learning element, capable of generating altered actions. The morally sound act according to the principle of balance, is anyway in the consequences the act has for the other(s), regardless of intention. A consequence orientation of this kind compels the one who decides to act morally to: (1) analyse his own abilities; (2) analyze the situation; (3) analyze possible consequences; (4) act; and (5) to learn from possible negative consequences of good intentions. Points 1-5, in addition to the balance principle, indicates that a morally sound act is not at the idea level, or at an abstract level, unable to be analysed, but is suited for empirical testing against the consequences of the action, i.e. according to the principle of balance a moral act is either good or bad, and can be tested scientifically. Concretely a morally sound act is linked to increased well-being for those who suffer most in a social system, which is an objective category, not necessarily related to the facial expression of the other(s), which must be regarded as a subjective category. The well being with the other/others is linked to his legitimate needs/wishes. A morally sound act and the ethical norms are linked to consequences. This means that solutions to moral/ethical problems presuppose knowledge about the case in question, and some moral/ethical problems can be solved by acquiring more factually based knowledge, either at the individual level (information gathering) or at a societal level (knowledge development about the context in question). Moral behaviour is a part of social behaviour, and basically all social behaviour is learnt, not congenital. So what we teach others is what we in turn (possibly in one or more generations) receive as social integration or disintegration. If we assume that there are three spheres of moral codes: private, professional, political (public), then at least one moral code is common to the three spheres: never to take advantage of the weak who are in a dependent relationship to oneself or the system one represents. One supposition is that all action strategies are made to sustain human dignity and self-respect. Human dignity and self-respect are related to the following factors: (1) Action strategies are linked to the needs/wishes of the other person(s). (2) Action strategies are linked to the notion of not taking advantage of others to reach personal goals. (3) Action strategies are linked to the notion of boosting the other persons self-respect. Concluding comments about the systemic position The systemic position tries to build a bridge between the classical controversy in social science between methodological individualism and methodological collectivism (holism) (Bunge, 1996).

Systemic philosophy

1583

K 34,9/10

1584

Precondition on which the systemic position is based are as follows: . Social facts exist, and they can be disclosed, even if some social facts only partly and gradually can be made visible. . Intuition is necessary in social science, but not sufficient in order to understand social systems. The sufficient element includes theories, models, rigorous methods and test against facts (giving data). . Analysis and synthesis are complementary activities, where the synthesis is the goal, i.e. disclosure of patterns in social systems. . Observation of social systems must be based on theory. If there is no theory, theory development (grand, medium, local) must be the objective of the study of social systems. If this does not happen, observations become a collection of data, and rigorous knowledge about social systems will be made impossible. . Morals and ethics constitute an important part of the study of social systems. . To understand, explain and predicate problems and phenomena in social systems are an important purpose of the study of social systems. This is not done through the disclosure of plausible social mechanisms with influence on problems and phenomena. . Even if social science makes use of adjacent social disciplines, it cannot be and should not be reduced to such. . Reflection, action and learning are a constant circular process in the study of social systems. The philosophy of social science is meant to give guidance for the researcher in a specific discipline, e.g. information science, sociology, anthropology, psychology, economy, etc. Bunge (1996, p. 11) points out certain requirements for the philosophy of social science. Four of these requirements are listed here. (Bunge operates with five more, which we hold to be subsumed in the following four.) (1) The relevancy requirement: is it relevant, i.e. does it deal with topical problems endured by the specific discipline. (2) Comprehensibility: can it be comprehended by a bright student of the subject? (3) Internal consistence: is it possible to refine concepts and propositions used in a way that generates greater clarity? (4) External consistency: is it in accordance with existing knowledge in the specific discipline? Conclusion The ideal requirements set up by the systemic position are: (1) Epistemology: moderate reductionism, based on a combination of realism and rationalism. . Methodology: there is a clear distinction between intention and behaviour. The intention is to be interpreted and understood. The behaviour should be explained and, if possible, predicated. Explanations of social phenomena must deal with both the individual and group level, as well as their interactions.

(2) Ontology: emergent ontology, based on systems theory. (3) Axiology: means-end systems must be subject to a systemic value analysis. (4) Ethical position: the systemic ethical principle of balance, i.e. the well-being of those most subject to suffering in the social system should be given number one priority, despite the fact that the well-being of the majority will deteriorate as a result of this priority. References Ashby, W.R. (1961), An Introduction to Cybernetics, Chapman & Hall, New York, NY. Ashby, W.R. (1981), “Constraint analysis of many dimensional relations”, in Conant, R. (Ed.), Mechanisms of Intelligence, Intersystems Seaside, CA. Barro, R.J. and Martin, X.S. (1995), Economic Growth, McGraw-Hill, New York, NY. Baum, G. (1996), Karl Polanyi: On Ethics and Economics, McGill-Queen’s University Press, Montreal. Baum, G. and Ellsberg, R. (1989), The Logic of Solidarity, Orbis Books, Maryknoll. Beer, S. (1979), Heart of Enterprise, Wiley, New York, NY. Benhabib, S. (1992), Autonomi och gemenskap, Daidalos. Bunge, M. (1977), The Furniture of the World, Reidel, Dordrecht. Bunge, M. (1983a), Exploring the World, Reidel, Dordrecht. Bunge, M. (1983b), Understanding the World, Reidel, Dordrecht. Bunge, M. (1985), Philosophy of Science and Technology. Part I, Reidel, Dordrecht. Bunge, M. (1989), Treatise on Basic Philosophy, Vol. 8 Ethics: The Good and the Right, 8, Reidel, Dordrecht. Bunge, M. (1996), Finding Philosophy in Social Science, Yale University Press, London. Bunge, M. (1998), Social Science Underdebate: A Philosophical Perspective, University of Toronto Press, Toronto. Collingwood, R.G. (1945), The Idea of History, Clarendon Press, Oxford. Fay, B. and Moon, D. (1994), “What would an adequate philosophy of social science look like”, in Martin, M. and McIntyre, L.C. (Eds), Readings in the Philosophy of Social Science, MIT-Press, Cambridge, MA, pp. 21-37 (adapted from: Philosophy of Social Science, Vol. 7, pp. 209-227, 1977). Johannessen, J-A. (1997a), “Aspects of causal processes”, Kybernetes, Vol. 26 No. 1, pp. 30-52. Johannessen, J-A. (1997b), “Philosophical problems with the design and use of information systems”, Kybernetes, Vol. 26 No. 3, pp. 30-48. Maruyama, M. (1963), “The second cybernetics: deviation amplifying mutual causal processes”, American Scientist, Vol. 51, pp. 164-79. Mendel, M. and Sale´e, D. (1991), The Legacy of Karl Polanyi, St. Martins Press, New York, NY. Myrdal, G. (1944), An American Dilemma: The Negro Problem and Modern Democracy, Harper, New York, NY. Myrdal, G. (1969), Objectivity in Social Research, Pantheon Books, New York, NY. Polanyi, K. (1957), The Great Transformation, Beacon Press, Boston, MA. Polanyi-Levitt, K. (1990), The Life and Work of Karl Polanyi, Black Rose Books, Montreal. Rawls, J. (1971), A Theory of Justice, Belknap Press, Cambridge, MA.

Systemic philosophy

1585

K 34,9/10

Shannon, C. and Weaver, W. (1949), Mathematical Theory of Communication, London. Turner, J. (1991), The Structure of Sociological Theory, Wadsworth Publishing, Belmont, CA. Weinberg, G. (1975), An Introduction to General Systems Thinking, Wiley-Interscience, New York, NY.

1586

Further reading Churchman, C.W. (1979), The Systems Approach and its Enemies, Basic Books, New York, NY. Churchman, C.W. (1981), Thought and Wisdom, Intersystems, Seaside, CA. Gadamer, H. (1975), Truth and Method, Sheed and Ward, London. Maturana, H. (1987), “Everything is said by an observer”, in Thompson, W.I. (Ed.), Gaia: A Way of Knowing, Lindisfarne Press, CA. Polanyi, M. (1962), Personal Knowledge, Routledge & Kegan Paul, London. Ulrich, W. (1988), “System thinking, systems practice, and practical philosophy: a program for research”, Systems Practice, Vol. 1, pp. 137-63. Wiener, N. (1948), Cybernetics or Control and Communication in Animal and Machine, Cambridge, MA. Winch, P. (1958), The Idea of a Social Science and its Relation to Philosophy, Routledge and Kegan Paul, London. Wittgenstein, L. (1953), Pholosophical Investigation, Macmillian, New York, NY. Wolin, S. (1972), “Political theory as a vocation”, in Fleisler, M. (Ed.), Machiavelli and The Nature of Political Thought, Atheneum, New York, NY.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Image labelling in real conditions

Image labelling in real conditions

Juan Manuel Garcı´a Chamizo, Andre´s Fuster Guillo´ and Jorge Azorı´n Lo´pez Department of IT and Computation, Alicante University, Alicante, Spain Abstract

1587 Revised May 2003

Purpose – According to the problems of visual perception, we propose a model for the processing of vision in adverse situations of illumination, scale, etc. In this paper, a model for image segmentation and labelling obtained in real conditions with different scales is proposed. Design/methodology/approach – The model is based on the texture identification of the scene’s objects by means of comparison with a database that stores series of each texture perceived with successive optic parameter values. As a basis for the model, self-organising maps have been used in several phases of the labelling process. Findings – The model has been conceived to systematically deal with the different causes that make vision difficult and allows it to be applied in a wide range of real situations. The results show high success rates in the labelling of scenes captured in different scale conditions, using very simple describers, such as different histograms of textures. Research limitations/implications – Our interest is directed towards systematising the proposal and experimenting on the influence of the other variables of the vision. We will also tackle the implantation of the classifier module so that the different causes can be dealt with by the reconfiguration of the same hardware (using reconfigurable hardware). Originality/value – This research approaches a very advanced angle of the vision problems: visual perception under adverse conditions. In order to deal with this problem, a model formulated with a general purpose is proposed. Our objective is to present an approach to conceive universal architectures (in the sense of being valid with independence of the implied magnitudes). Keywords Vision, Cybernetics, Neural nets, Image sensors Paper type Research paper

Introduction During the last few years, considerable advances have been made in vision techniques. However, there are still very few studies aimed at dealing with situations in natural environments taking the scene’s realism into account, where natural light is changeable or is not uniform, the scene’s different planes become unfocused, the scale of an object’s perception can change according to its distance, etc. (Flusser and Toma´sˇ, 1998; Biemond et al., 1990; Moik, 1980). The majority of studies have solved these problems with generally specific pre-processing methods, catalogued as enhancement or restoration methods (Rosenfeld and Kak, 1982; Gonza´lez and Woods, 1992; Sonka et al., 1998). When the aim is the segmentation and interpretation of the objects in a real scene, the task can be made easier if the configuration at surface texture level is known (Won, 2000; Bhalerao and Wilson, 2000; Campbell et al., 1997; Shanahan et al., 1999). Many techniques highlight the classification capacity of the characterizers extracted from images (Haralick, 1979; Tamura et al., 1978) searching for properties that are invariable or tolerant with the variation of optic parameters (Cohen et al., 1991; Leow and Lai, 2000; Sim et al., 2000; Teuner et al., 1997).

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1587-1597 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614830

K 34,9/10

1588

In the context of the research project Vision system for autonomous navigation (this work was partly supported by the CICYT TAP1998-0333-C03-03), one of the study’s aims is the development of a light and realistic autonomous vision device (Pujol et al., 2001). To this effect, this paper proposes a general segmentation and labelling model for real conditions acquired scenes, which could be systematically used in a wide range of real situations (illumination, scale, focus variations, etc.). The proposal tackles the problem by means of the texture identification of the scene’s objects, highlighting the simplicity of the characterizers used in the model, which will result in low computing costs and the possibility of formulating iterative algorithms aimed at solving real time problems. Texture classification in real scenarios is carried out by querying databases that store series of each surface captured with successive optical parameter values: the collection of each texture perceived at successive distances, the collection with different light intensities, etc. This approach implies the handling of large volumes of images. Consequently, the use of self-organising maps (Fritzke, 1997; Kohonen, 1995) to organise knowledge bases enables its discrimination capacity to be exploited and spatiotemporal costs to be reduced. On the other hand, the possibilities for hardware implementation of the self-organising maps (Hammerstrom and Nguyen, 1991) will permit the model’s systematic application by means of hardware reconfigurations. Problem formulation A given device, with a given calibration and in environmental conditions, has sensitivity around a value of the variable on which it operates, which is the so-called calibration point. The function that describes the device’s behaviour acquires values at an interval around the calibration point. Generally speaking, we can assume that for another calibration point (and even for another device), the calibration function is different. For each device, there will be a calibration point that generates an optimum calibration chart. If C is the value of an input magnitude of a sensor and generally speaking, of a system and La ¼ Lða xj Þ is the function that represents the calibration of the values a xj of the n variables that characterise the sensor (environmental conditions or the system’s characteristics), the sensor (system) output could be expressed as: a;C

f ¼ f ðC; La Þ

;inf xj # xj #sup xj j ¼ 1; · · ·; n

ð1Þ

For another calibration, the system output will be: b;C

f ¼ f ðC; Lb Þ

;inf xj # xj #sup xj j ¼ 1; · · ·; n

ð2Þ

With an input C and the system output a;C f for one of the known calibrations; the output b;C f for another calibration could be synthesised. b;C

f ¼ Ts ðLb ;a;C f Þ

ð3Þ

The interest of this research consists of proposing a general method that carries out transformation Ts, independently of the variables xj of the calibration function Lða xj Þ studied. This approach enables us to achieve our aim of proposing a general model for image treatment. The arguments could reflect for example, lighting conditions of the acquisition: with an image a;C f captured with deficient lighting

La ¼ Lða illuminationÞ; the method will allow us to synthesise a new image b;C f with improved lighting conditions Lb. The same can be said for other variables, such as image resolution, which is the case in question in this paper. Other transformation models could be dealt with, such as the estimate of the calibration function value Lb that generates the image b;C f for an input C that we will call TL. Lb ¼ TL ðb;C f Þ

ð4Þ

Another transformation model TC of fundamental interest consists of obtaining the region labelling function C u of the image b;C f ; acquired with values of the calibration function Lb. C

u ¼ TC ðb;C f Þ

ð5Þ

C

The region labelling function u must be independent of the calibration function values; i.e., invariable to the context. C

u ¼ uðCÞ ¼Cuðb;C f Þ ¼Cuða;C f Þ

ð6Þ

In this paper, we will focus on the use of the transformations expressed in (4) and (5) to deal with the segmentation and labelling of a scene from an image. Proposed solution The formulation of the problem that has been carried out in the previous section is open and depending on the transformation functions characteristics T (equations (3)-(5)) and on the knowledge we have of these functions, different methods can be proposed to solve the problem. In the simplest cases, the result could be obtained analytically if the functional expressions for T are known. In the specific case of image treatment, as these functional expressions are not known, we will have to work in explicit terms resorting to databases that contain the magnitude values. To be more specific, this work is based on the use of textures. Labelling is obtained by comparing the descriptor of an unknown texture with the descriptors previously stored in a database for different materials and different calibrations. Consequently, the proposed general model of transformation T uses knowledge bases to infer the calibration function values La (equation (4)) or provide the region labelling function C u (equation (5)). In any case, the inference from image b;C f ; with the image a,Cf being known for different calibration values La, is suggested. We will call these databases DBða;C f ; La Þ: Consequently, we could formulate the expressions thus: b;C Lb ¼ T L f ; DBða;C f ; La ÞÞ DB ð

ð7Þ

b;C u ¼ TC f ; DBða;C f ; La ÞÞ DB ð

ð8Þ

C

The database queries in equations (7) and (8) can be simplified by previously estimating the values La or C and the subsequent query of the partial view of the databases for the known values of La or C we will call DBLa ða;C f ; La Þ or DBC ða;C f ; La Þ: In the proposal, the prior estimate of La is carried out for several

Image labelling in real conditions

1589

K 34,9/10

1590

reasons: we assume that the calibration function La values present low spatial dispersion with regard to the dispersion of the function C u: On the other hand, the main aim of this study is to obtain region labelling and to a lesser extent, the estimate of the calibration parameters. Consequently, the calibration function La will be previously estimated by means of equation (7), which will enable us to obtain more precisely the region labelling function C u by querying the partial view of the databases DBLa ða;C f ; La Þ (equation (9)). C

b;C u ¼ TC f Lb ; DBLa ða;C f ; La ÞÞ DB;L ð

ð9Þ

The steps are shown in Figure 1. Pre-processing: calibration estimate (equation (7)) . Scan the unknown image with a window and classify each of the image’s elements of the size of the window according to the best match found in the database, label the region with the calibration of the database element DBLa ða;C f ; La Þ: . Using the elemental calibrations, estimate the image calibration. Assumptions can be made on calibration uniformity in the whole image or in some parts. General calibration or calibration of the parts can be estimated as a statistical parameter of the elemental calibrations obtained for each position of the scan window. Other heuristics can also be used according to the knowledge and nature of the problem to be tackled; for example, the support of a complementary segmentation technique. Processing: image labelling (equation (9)) Scan the unknown image again with a window and classify each image element of the window size according to the best match found in the view of the specific database for that calibration DBLa ða;C f ; La Þ: Implementation using SOM In order to tackle the scan windows’ classification task of the unknown image by comparison with different images stored in databases, self-organising maps have been

Figure 1. Model for image labelling in real conditions

used due to their discriminating capacity and high degree of parallelism inherent to connectionist methodologies. These self-organising maps enable the discriminating capacity of different features extracted from the images to be evaluated; i.e., their suitability for grouping the unknown images together in accordance with different classification criteria, such as region labelling or the calibration value. On the other hand, these self-organising maps will serve as the basis for a general model for the vision system in realistic conditions. The model will be general and will enable the problems of realism introduced by the different calibration variables to be dealt with by means of simple reconfigurations of the self-organising map neurons. Our aim in this paper is to propose a general model and not to evaluate the self-organising maps advantages or disadvantages against other classification methods. The self-organising maps have been constructed from features extracted from images t ða;C f Þ from the database DBða;C f ; La Þ (different materials C u for different calibration function values La). According to the classification criterion for this set of features t ða;C f Þ differentC self-organising maps are obtained; they are classified according to material u SOMðt ða;C f Þ; La Þ or to calibration function values La SOMðt ða;C f Þ; La Þ: C The labelling of self-organising maps per surface u SOMðt ða;C f Þ; La Þ may provide success levels that indicate the suitability in certain cases, of the central part of the processing to carry out region labelling C u: C

C

b;C u ¼ TC f ; u SOM ðt ða;C f Þ; La ÞÞ SOM ð

ð10Þ

As previously mentioned, database queries can be simplified by the prior estimate of the calibration values La and the subsequent query of the partial view of the databases DBLa ða;C f ; La Þ: These database partial views will be classified per material C u SOMLa ðtða;C f Þ; La Þ: Once the calibration value Lb has been estimated, the map corresponding to this value is activated, as expressed in equation (11). These partial maps separated by calibration levels Lb do away with the overlapping of some patterns and thus offer better results. C

C

b;C u ¼ TC f ; Lb ; u SOMLa ðt ða;C f Þ; La ÞÞ SOM;L ð

ð11Þ

In the pre-processing phase, the calibration Lb is also estimated by means of database queries DBða;C f ; La Þ: We also use self-organising maps labelled according to calibration values La SOMðt ða;C f Þ; La Þ; as seen in equation (12). a

b;C L L b ¼ TL f ; SOMðt ða;C f Þ; La ÞÞ SOM ð

ð12Þ

As previously mentioned, database queries can be simplified by prior estimate, in this case, of the values C and the subsequent query of the partial view of the database for the known values of C we will call DBC ða;C f ; La Þ: On the other hand, not all materials have the same suitability for calibration estimate so the complete database is used to carry out prior region labelling C u (equation (10)), after which the surfaces suitable for estimating Lb will be selected. By separating the databases – in this case per surface – DBC ða;C f ; La Þ; we can label the maps per calibration and select the ones that offer a higher degree of success (equation (13)).

Image labelling in real conditions

1591

K 34,9/10

1592

In Figure 2, the diagram that shows the model for image labelling can be seen. a

b;C C L Lb ¼ TL f ; u; SOMC ðtða;C f Þ; La ÞÞ: SOM;C ð

ð13Þ

Model application in scale treatment The model presented in the previous sections is general and could be used for the treatment of different calibration parameters such as lighting, scale, focusing conditions, etc. In this section, its application will be specified on the analysis of the scaling level Lb. An extensive collection of images capturing 14 materials (six fabrics, two woods, marble, cork, two earthenware, terrazzo) with 150 scale values has been created; i.e., 2,100 captured images. In order to obtain these images, a programmable calibration capture system has been developed using a motorised optics Computar M10Z1118 and a high-resolution Hitachi KPF100 1300 £ 1030 camera. This data was obtained while maintaining a stable environment for the rest of the optical parameters. The numbers of pixels of each of the images containing the materials are dependent on the real world area in the scene. That is, the real world area captured for the construction of the database is the same for all the images on all the scales, which means that more distant images have fewer pixels ð86 £ 86Þ than the nearest ones ð783 £ 783Þ: After the previous cutting of the material surfaces of the 2,100 images, these images have been cut into smaller samples of 80 £ 80 pixels each for reasons of coherence with the scan window size of the labelling algorithms (see Figure 3). The number of samples ranges from 1 (the image with the least scale) to 81. The total number of samples is 29.890. Self-organising maps The use of self-organising maps has been proposed to classify features extracted from images in the database previously described. We will describe the properties of each of the maps used in the process below (characterizers used, number of neurons, success rates, etc.). We have previously mentioned that one of the advantages of the model is the use of simple characterizers, which permit its iterative implementation as a basis for a generic model aimed at tackling problems with real time restrictions. The instances of the function tða;C f Þ were different according to the labelling aims. For surface labelling of the functions (10) and (11), a generic characterizer such as the brightness histogram was used. With regard to scale value labelling (equations (12)

Figure 2. The diagram shows the whole model for image labelling in real conditions based on reconfigurable SOMs

Image labelling in real conditions

1593 Figure 3. Some of the database sample of different materials for different scale values

and (13)), the characterizer used is the morphological coefficient histogram (Garcı´a and Ibarra, 1995). (1) Classification rate of the self-organising map to classify the complete database according to materials (expression (10)). A classification rate of over 85 percent was obtained; i.e., 85 percent of the patterns of the database (29.890) were correctly classified, the remaining 15 percent activated neurons linked to several materials. This enables us to approach region labelling exclusively using this part of the process in applications with relaxed requirements. The number of neurones was 40 £ 40: (2) Classification rates of the SOM for partial views of the database per scale values and labelling per surface (expression (11)). The brightness histogram was used again for these databases. We observed improved results with regard to the classification of the complete database, as a result of dividing the problem (Table I). The 150 scale values are grouped into 15. The high scale values, from

Scale (pix./cm) 0 (2.91) 1 (3.18) 2 (3.57) 3 (4.03) 4 (4.55) 5 (5.18) 6 (5.86) 7 (6.75) 8 (7.80) 9 (9.11) 10 (10.78) 11 (12.88) 12 (15.57) 13 (18.95) 14 (23.21)

C. rate rate (percent) 100 100 100 100 100 100 100 100 100 92 96 90 95 93 87

Table I. Classification rates of self-organising maps for expression (11)

K 34,9/10

1594

9 to 14, were seen to offer slightly lower results as a result of insufficient areas of the materials being reflected. (3) Classification rates of the SOM for the classification per scale value of the whole database (equation (12)). From the features studied, the best results regarding the labelling of this map correspond to the use of morphological coefficient histograms, although the total database does not offer sufficiently interesting results (20 percent classification rate). The characterizers studied – which are not specified in this paper – have sought the size of geometrical shapes in the materials’ textures. The dimension of these shapes depends on the scale as well as on the material studied which makes scale classification difficult without prior knowledge of the nature of the material (equation (12)). (4) Classification rates of the self-organising maps for the classification per scale value of the database separated according to surface (equation (13)). We found that the separation of the database according to surface improved image classification with regard to scaling. Morphological coefficient histograms were also used for these maps. The classification rates of the SOM of expression (13) grouped into 15 scale values ranges from 68 percent for terrazzo to 48 percent for black wood. Assuming a low spatial dispersion of the scale in the whole scene with regard to dispersion of the materials, the scale’s general value could be estimated as a statistical parameter of the elemental scale values obtained for each position of the scan window. The use of these maps in the more precise construction of scene depth maps requires the search for characterizers that provide better classification rates. These characterizers are not included in this study. However, as a low dispersion of scale in the scene is assumed, a high success rate – applying the mode as the statistical parameter – is obtained. Scenario labelling Once the classification capacities of the process’ different maps have been reviewed, a series of tests with scenarios has been designed, based on the composition of real images not included in the database. Each of these scenarios contains the different surfaces corresponding to one of the scale values (from 2.91 to 23.21 pix./cm). In Figure 4 the success rate of the model with and without pre-processing can be seen. In Figure 5 one of the scenarios used as a benchmark is shown. We can see (Figure 6) the good results (90.1 percent of success rate) of the model’s application in a real scenario with a scale value of 4.55 pix./cm. The same scenario has been labelled without pre-processing obtaining a lower success rate (71.5 percent) These results depend on the stability in the scene of the rest of the calibration variables not dealt with in the application. Conclusions This work offers a model for the segmentation and labelling of images acquired in realistic environmental conditions. The model has been conceived to systematically deal with the different causes that make vision difficult and allows it to be applied in a wide range of real situations: changes in lighting, changes in scale, faulty focusing, etc. The proposal is based on texture classification using self-organising maps for the organisation of databases that store series of each texture perceived with successive optical parameter values. More specifically, the results of the model applications in

Image labelling in real conditions

1595 Figure 4. Success rates of the use of the complete model as opposed to the model without pre-processing

Figure 5. One of the scenarios used as a benchmark with its labelling results

Figure 6. One of the real scenarios with its labelling results

the labelling and segmentation of real scenes, perceived with different scale values, are reflected. The results show high success rates in the labelling of real scenes captured in different scale conditions, using very simple describers, such as different histograms of textures. This fact shows that self-organising maps are suitable for solving this problem and can be used as a basis for a general, robust vision architecture. From the results obtained, our interest is directed towards systematising the proposal and experimenting on the influence of the other variables of the vision process that have yet to be tested. Subsequently, we will propose an integral system for robust artificial vision that jointly considers all the parameters that can present difficulties for artificial

K 34,9/10

visual perception. We will also tackle the implantation of the classifier module so that the different causes can be dealt with by the reconfiguration of the same hardware. To do this, we will use reconfigurable hardware, which has the ideal features for the proposed requirements, providing low level implementations with a high degree of parallelism and with reconfiguration capacity.

1596

References Bhalerao, A. and Wilson, R. (2000), “Unsupervised image segmentation combining region and boundary estimation”, Image and Vision Computing, Vol. 19 No. 6, pp. 353-68. Biemond, J., Lagendijk, R.L. and Mersereau, R.M. (1990), “Iterative methods for image deblurring”, Proc. IEEE, Vol. 78, pp. 856-83. Campbell, N.W., Mackeown, W.P.J., Thomas, B.T. and Troscianko, T. (1997), “Interpreting image databases by region classification”, Pattern Recognition, Vol. 30 No. 4, pp. 555-63. Cohen, F.S., Fan, Z. and Patel, M.A. (1991), “Classification of rotated and scaled textured images using Gaussian Markov random field models”, IEEE Transactions on PAMI, Vol. 13 No. 2, pp. 192-202. Flusser, J. and Toma´sˇ, S. (1998), “Degraded image analysis: an invariant approach”, IEEE Transactions on Pattern Analysis and Machine Intelligen, Vol. 20 No. 6, pp. 590-603. Fritzke, B. (1997), “Some competitive learning methods”, Draft Paper, System Biophysics, Institute for Neural Computation, Rurh-Universita¨t Bochum. Garcı´a, J.M. and Ibarra, F. (1995), “Segmentation of defects in textile fabric using semi-cover vector and self-organization”, Proceedings of the International Conference on Quality Control by Artificial Vision, pp. 58-65, France. ´ Gonzalez, R.C. and Woods, R.E. (1992), Digital Image Processing, Addison-Wesley, Reading, MA. Hammerstrom, D. and Nguyen, N. (1991), “An implementation of Kohonen’s self-organizing map on the adaptive solutions neurocomputer”, in Kohonen, T. et al. (Eds), Artificial Neural Networks, Elsevier Science Publishers, Amsterdam, pp. 715-9. Haralick, R.M. (1979), “Statistical and structural approaches to texture”, Proceedings of IEEE, Vol. 67, pp. 786-804. Kohonen, T. (1995), Self-Organizing Maps, Springer-Verlag, Berlin. Leow, W.K. and Lai, S.Y. (2000), “Scale and orientation-invariant texture matching for image retrieval”, Texture Analysis in Machine Vision, World Scientific, Singapore. Moik, J.G. (1980), Digital Processing of Remotely Sensed Images, NASA SP-431, Washington, DC. Pujol, F., Garcı´a, J.M., Fuster, A., Pujol, M. and Rizo, R. (2001), “Use of mathematical morphology in real-time path planning”, Kybernetes (The International Journal of Systems & Cybernetics), Vol. 31 No. 1, pp. 115-24. Rosenfeld, A. and Kak, A.C. (1982), Digital Picture Processing, 2nd ed., Academic Press, New York, NY. Shanahan, J.G., Baldwin, J.F., Thomas, B.T., Martin, T.P., Campbell, N.W. and Mirmehdi, M. (1999), “Transitioning from recognition to understanding in vision using cartesian granule feature models”, Additive Proceedings of the International Conference of the North American Fuzzy Information Processing Society, NAFIPS, New York, NY, pp. 710-4. Sim, D.G., Kim, H.K. and Oh, D.I. (2000), “Translation, scale, and rotation invariant texture descriptor for texture-based image retrieval”, Proceedings ICIP 2000 WP0707. Sonka, M., Hlavac, V. and Boyle, R. (1998), Image Processing, Analysis, and Machine Vision, 2nd ed., Brooks/Cole Publishing Company, Monterey.

Tamura, H., Mori, S. and Yamawaki, T. (1978), “Textural features corresponding to visual perception”, IEEE Transactions on SMC, Vol. 8 No. 6, pp. 460-73. Teuner, A., Pichler, O., Santos, J.O. and Hosticka, B.J. (1997), “Orientation-and scale-invariant recognition of textures in multi-object scenes”, Proc. ICIP, pp. 174-7. Won, C.S. (2000), “Block-based unsupervised natural image segmentation”, Optical Engineering, Vol. 39 No. 12, pp. 3146-53.

Image labelling in real conditions

1597 ( Juan Manuel Garcı´a Chamizo received his BSc in Physics at the University of Granada (Spain) in 1980, and his PhD in Computer Science at the University of Alicante (Spain) in 1994. He is currently professor and head of the Department of IT and Computation at the University of Alicante. His current research interests are computer vision, reconfigurable hardware, biomedical applications, computer networks and architectures and artificial neural networks. Dr Garcı´a Chamizo has directed several research projects related to the aforementioned areas of interest. He is a member of a Spanish Consulting Commission on Electronics, Computer Science and Communications. He is also member and editor of some program committees conferences. Andre´s Fuster Guillo´ received his BS degree in Computer Science Engineering from the University of Valencia (Spain) in 1995. He has been a member of the Department of IT and Computation at the University of Alicante since 1997, where he is currently assistant professor. His research interests are in the area of computer vision and artificial neural networks. Jorge Azorı´n Lo´pez received his BS degree in Computer Science Engineering from the University of Alicante (Spain) in 2001. He is a PhD. student at the Department of IT and Computation at the University of Alicante. His research interests are in the area of computer vision and artificial neural networks.)

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

K 34,9/10

Aspects of a theory of systemic construction

1598

Faculty of Managerial Engineering, Ecological University Bucharest, Bucuresti, Romania

Nicolae Bulz Abstract Purpose – To consider aspects of a theory of systemic construction by discussing two concepts which will assist in our understanding of the surrounding world which it is considered is made of both systemic and non-systemic entities. Design/methodology/approach – Considers how these entities (metasystem network, transitron etc.) can be conceived and defined. Systemic frames notions are presented and examples of systems given. Discusses the historic use of the word “system” and systemic thinking and its varieties. Findings – Discovered that on the basis of these concepts, an understanding of the surrounding world can be achieved which is not homogeneous but made of both systemic and non-systemic entities. These can change when certain systemic properties are reached as well as in their specific degrees in their limitations and paradoxes. Originality/value – Introduces an original approach to the life support system by proposing concepts that are discussed and defined and that will provide cyberneticians and systemists with a revised view of systemic thinking. Keywords Cybernetics, Systems theory Paper type Conceptual paper

1. Introduction One of today’s possible approaches to the life support system includes two statements: (1) An entity perceived as system has only the following properties: synergy of its parts, non-entropy within its confines, and ephemerality as the result of higher performances and less resources; (2) The systemic world comprises both decisional paradoxes and informational-actional limitations. The combinatorics and the intensity of the properties, paradoxes, and limitations that characterize an entity are equally descriptive of the variety of thinking (with its effects

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1598-1632 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614849

The author thanks all his real teachers into understanding and explanation upon Reality. The main text and the bibliography may be seen in this way. But, if the completeness of the “acknowledgments” requires one, two,. . . great names – at least, than are to be considered [as titans of non-systemic profoundness] Nicolaus Cusanus (1401-1464) and Mihai Eminescu (1850-1889; Romanian greatest poet), and as a double connection between them: Herbert Simon (Science of the Artificial; Administrative Behaviour; . . . , satisfaction approach) and Alexandru de Mocsonyi (1841-1909; a Romanian predecessor of the contemporary transdisciplinary insight). Special “future-thanks” for the possible colleagues and real teachers in order to enrich the comparative proves for systemic and non-systemic entities; the author hopes for appropriate applications connected with some hard domains: poverty, welfare, social policies, sociology of religion, social psychology, social movements, sociocybernetical insight, comparative studies, lexical and (last but not least) ecological questions/problems.

upon scientific and daily trends of modern thought). Consequently, four varieties of systemic thinking as conceived by four great philosophers are presented: analytic/Descartes, holistic/Plato, experimental/Bacon, experiential/Bergson (experiential refers to the natural, spontaneous facts). In this context, two concepts are defined regarding the (ideal) life support system: equilibrium and metaequilibrium. Both of them are brought into connection with those four varieties of systemic thinking as a result of the specific contribution of: Spinoza/Russell, Goethe, Leibnitz, and respectively Cusanus. On the basis of these concepts can be achieved an understanding of the surrounding world, not as homogeneous, but made of both systemic and non-systemic entities (metasystem, network, transitron. . .). All these entities may and can turn themselves one another if they reach certain parameters of the above-mentioned systemic properties, as well as specific degrees in their limitations and paradoxes. A glossary of the terms used is included as an appendix to this paper. 2. Proper ordering of some systemic frames notions If it is accepted that the existence of our world is represented by a set of real entities and by a set of concept entities, then a rational subject delimitates the observable from the non-observable real and the theoretic concepts from the non-theoretic concepts. Only not instant clear connections are possible between the four delimited parts enumerated above. The fuzziness of this possibility is proper to each action of humankind toward micro and toward macrocosmos domain (and proper nearly to each rational subject). However, the following potential world-mind reservoir (shortly, {World-Mind}) relation (as a type of representation) is to be accepted as a background to the intended ordering of some proper systemic frame notions:

The mental construct is the “word peak” of at least one mental concept; a mental concept is an innate and/or actively obtained structural-phenomenological mind entity; the notion is at least one “word peak” of the non-theoretic concepts and theoretic concepts. 2.1 System-Information relation Inside the {world-mind} context, there is an aggregation of resources to be understood/explained as a system; the system can coordinate its resources. If the resources are both human and technical, then their team-aggregation hierarchically erects a mixed system. Into mixed system the resources and the teams’ local information are aggregated according to the supreme goal and the decisions are expanded according to all local goals; the overall goal of a mixed system is attained as long-term cycles of information-decision-action which are functionally, structurally and conceptually adapted.

Theory of systemic construction 1599

K 34,9/10

1600

As cases of systemic aggregates of resources that would not be mixed system: . an interplanetary autonomous station which has definitively failed transmissions with its base; . a locked nuclear electro-station; . an ancient mechanism the functionality of which is forgotten; and . a wasteland/deserted city. So, these are exceptions. A mixed system is a quasi-generalized long-termed human-machine reality – and a significant part of the life support systems. Inside the mixed system context, a three-side ego unity reveals itself: the (hypothetical) real ego, this one’s own model, and the ideal ego (connected to its implemented norms). This triad: (hypothetical) real-model-ideal would be adequate for any system (not for mixed system only), if exist systems of ideas, words, models, ideals, (religious) beliefs. Are all these systems so separate from human machine systemic background (ephemeral background)? Maybe. . . A mental travel through {world-mind} – can be a beneficial one as concerns this question. What about nature and information? Is this travel possible without a human nature? 2.1.1 (Hypothetical) real system. According to an extreme condensation of notions proper to contemporary cognition, the matter may be the open triad: substance, energy, information. An essential opening to the matter triad is concerned with the variety of the world (as any observer perceives it). If the reality of some material concentrations and the relative contours of these concentrations succeed the variety of the world (and not its general homogeneity), then the existence of some objective elements is a natural one. Their plurality (as number of realizations, variances, similitude, distinctions) would group these objective elements on the level(s). Both the overall space and time might be dual to the objective element and level. An objective element may have sub-elements on sub-levels, or may aggregate itself together with other objective element – the result being an objective element on a super-level. An objective element is implicitly generated on a (hypothetical initial generated) level, may become there and may generate another objective element even on the same level, or on the other levels – “beside”, “inside”, “above”,“under”,. . . The same set of objective element absorbs at least one “under” objective element, and sinks “above” no more than one objective element. Fundamentally, this phrase is an “emergent pattern” of the objective element world intelligence (or of something “similar” to intelligence) and/or of a supreme objective element (God). But in fact this phrase is one of the phrases possible. The rational subject is not implied ontologically here, as it must not be across all this sub-sub-section (2.1.1.). Although this arborescent vision from micro toward macrocosmos may be too elementary, there may exist supplementary levels, forbidden levels, profound zones. Both the biological level and the social one (both very close to rational subject) contain complexities. Cases of profound zones correspond to: rationality, decidability, markets, social, ethnic, religious and ecological tensions, . . . , Cosmos inside universe. Cases of supplementary levels correspond to: Cosmos and earth life strata, economic and political revolutionized human societies, universal cognition in spite of discontinuities. Fuzzy denominated cases for supplementary levels would occur as forbidden levels.

Objective elements – after having been generated – restructure themselves, or are restructured more freely than levels. The stability/instability of an objective element and level is to be comprehended as a successive characteristic derived from the generability of the objective element and level. 2.1.2 Model of the system. Our world of objective element contains objective element/level and the relations between these objective element/level. The reflection is a connection between two objective element, of which the clear and absolutely passive one is the reflected and the other one is the reflector (passive or active). The reflector generates information that is an element of reflection, shortly information-reflection element. This information-reflection element is not substance or energy lost by the reflected part. Through a chaotic stream in the information-reflection element and other objective elements (non-parts to current reflection), there are some active steps upon information-reflection element (re)generation, memorization, representation, representation of representation. All these, taken globally or piece by piece, instantly or over a certain period (a chronological period – from “Kronos”-Greek word/notion equivalent to “Tempus”-Latin word; or another suitable period: kairotic – from “Kairos”-Greek (apart) word/notion for suitable moment) are steps of the reflector objective element, or of other associated objective element, upon the reflected objective element. So, a model of the lately mentioned one is erected as artifact on a real support. Inside a mixed system, an information erected as model, shortly an information-model, has its own way and it is also an artifact, dual to the first artifact – the support – and dual to the reflected objective element too. The position of objective information (objective model recognized inside the mixed system) may be reached or not. If all cycles of becoming are efficiently closed, after a period of time, mixed system is assuming that the current model is an objective information (not a false, dangerous, inutile chimera). An objective information is an objective element. The stability of an objective information, the ascending/descending position of information-model, objective information and of the real objective element reflected, and of the real objective element involved for modeling are correlated. Information-model travel through {world-mind} relation is equivalent to information-reflection element transit and becoming as objective information. 2.1.3 Ideal system. Any mixed system contains teams (at least one). These teams are specialized to act at the forecasted, planned, organized, decided, coordinate and/or controlled locus (distributed in space-time separately or in parallel). Mixed systems are built out of initial action, but problems may occur, are constituted. To represent and solve the problem (in parallel dealing with initial action) is the main craft of a rational subject inside mixed system. Rational subject is a member of an ad hoc, preventive/operative team, a human resource coupled with technical resources. There is not a chaotic long-termed chance into a sustainable mixed system. Life support systems use chaotic events intelligently. Otherwise survival is not possible. For mixed system, as for particular life support systems, the sustainability exists. Each sustainable mixed system draws out a set of norms in order to recognize the correct from the incorrect action, an initial action or a solving problem case. The “quantity” of norms (connected to mixed system corpus), their becoming as values, ideals, their stability, their adaptation or not are long-term results that constitute the background for totalitarianism, oscillations, relative evolution (decline, stationary, development).

Theory of systemic construction 1601

K 34,9/10

1602

Figure 1. System-information relationship

The human resource from a mixed system may be a quasi-technical one or a supreme one. The ideal system possesses the main part of the answer, but not the whole answer. So, if there are real sequences of information-reflection element, information-model, and objective information, and if the three-sided ego of a system exists – (hypothetical) real, model, ideal – and survives, then the system-information relation is constructed by the model-ego and information-model connection. This connection is a travel pattern through {world-mind}. Figure 1 tries to synthesize all these. All these exist for a particular mixed system, and are expected for an entire set of sustainable mixed system, and are projected, mapped image-notions for other systems. For example, an idea system – a particular one – may be that of the idea upon mixed system. 2.2 Objective element, rational subject, entity, system There exists an inner part of an epistemic strategy: to define objective element, to erect a triadic view both upon the system and information, to erect an analytical view upon rational subject (a minimal one), and then to construct the “entity” notion, and only after all these to elicit the “system” notion. A rational subject is an objective element, it is a reflector (at least); but rational subject is able to reflect its own reflection (a superior step in comparison with the representation of the representation). Figure 2 presents the possible cases of rational subject and objective element coupling. The entity notion refers to this couple. An entity may contain one or more objective element, in their sameness. The rational subject, itself, may be a long-term assumption, as well as a medium capable rational entity (! seen by another rational subject, complied with the medium one; that is the statistics view, but not only). So, the multiplicity view inside an entity is a rational subject’s craft upon the reality of the world of objective elements, outlined only through its rational capacities. Unlike the entity, the system is independent of the rational subject branch. The concrete system is unique. Any kind of concrete system can be melted into the “system” notion frame. Where is the rational subject placed? Evidently it is very close to some systems (to rational subject it is one sequence identical, but no more); to other

Theory of systemic construction 1603

Figure 2. Possible cases of rational subject and objective element coupling

K 34,9/10

systems it is very apart (e.g. cosmos, micro/micro universe). But “system” notion support all these (as the “entity” notion can support “all” the events). This approach is intended to be a tool (here only a Gnostic one acting toward the epistemic stage) for revealing the order existent in an amalgamated systemic world – i.e. objective element, rational subject (hypothetical) real systems, idea, a.s.o.)

1604

2.3 Magellanity All the above sub-sections contain exercises for {world-mind} traveling. Traveling through (world-mind), a rational subject fulfills a very particular task (e.g. What about personal qualification?) or a more sophisticated one (e.g. What about a general system?) The {world-mind} context must reveal an indicatory property – “easy” to be verified conceptually at least). If there is an equipoise, e.g. an equilibrate square, or linear shape of non-observable real, observable real, mental concepts, and mental constructs, and if rational subject is from the first time convinced upon this (relative to a clear questioned domain), then rational subject reaches the magellanity proper to its {world-mind} travel. This property denomination is connected to [Ferna˜o de Magalha˜es] Magellan’s 1521 travel (finished by a part of his crew – and not by himself, who died before), the first time when the definitive conviction upon earth’s spherical shape was reached. So, the magellanity property (shortly (M *)) consists of: There is non – observable real , notions connections inside {world – mind} ðM* Þ:

3. Historic survey across “system” word using/“system” notion becoming 3.1 Before and after Aristotle’s “sustema” The word “system” is used four times in Aristotle’s Metaphysika. There are the negative references at “Plato’s Ideas system” (twice explicitly). Aristotle analyses the entire Greek philosophical background, and especially rejects Plato’s axiom: The Ideas’ existence as numbers, with their substantiality. In his turn, he constructs the double axiom: The existence of an eternal, unmovable, and apart of any sensitiveness substance; the thinking (of this substance) thinks upon itself. The divine element (which seems comprised by the intellect) belongs to the first unmovable movement. Aristotle constructs an entire coherent metaphysical system analyzing, criticizing, and eliciting new ideas. The entire philosophical thinking, directly or indirectly known by Aristotle, is the challenge emerging his own system. So, Plato’s system as “stimulus” consists from: Aristotle’s direct understanding as Plato’s disciple at the Academia and later recollections, Plato’s written dialogues – at that stage, and Speusip’s discourse upon ideas (after Plato’s death, Speusip is the leader/scholar of the Academia: 347-339 BC ); Aristotle returned in Athens as late as 335 BC . Thus, there are a lot of references to ideas: four as “system” (above-mentioned), five as “theory”, and 42 simply as idea(s). This heterogeneous denomination may prove a relative subjectivity and/or a relative local desired nuisance of “system” word philosophical use (maybe even its first appearance, but this must be proved). Thus, the first time when the word “system” appears seems not to be an isolated cognitive event (inside a philosophic system); this is so because:

(1) A set of philosophic ideas becomes (is recognized as) a system according to its coherence; that is: wide spreading, resistance at sophistic questionnaire, inner consistence, outstanding synthesis of the entire set (text). So, Aristotle rejects the meaning of the Plato’s texts, but does not deny their “system” status [Then, the same case is for Aristotle’s Metaphysika.]. This systemic perception of a text (simultaneous “system” word use) was indebted to long-termed, critically overlapped sequences of lives, genial works, and teaching disciples. The following string sets only nine philosophers and four generals/political leaders from a great ancient Greek plead: . Pythagoras (c.570-c.480 BC ), . Heraclitus (c.550-c.480 BC ), . Themistokles (525-460 BC ), . Parmenides (c.515-c.440 BC ), . Anaxagoras (c.500-428 BC ), . Pericle (c.495-429 BC ), . Empedokles (c.490-430 BC ), . Protagoras (c.486-410 BC ), . Sokrates (c.470-399 BC ), . Alcibiade (c.450-404 BC ), . Plato (c.427-347 BC ), . Aristotles (384-322 BC ), . Alexander the Great (356-323 BC ). This string would demonstrate that two centuries of vivant linked thinking and acting elicited a systemic perception; high mind and status were the background for it. . It is questionable whether Aristotle is the first to use the word “sustema”. The ethimology of this word is: “sun” ¼ “with” and “istemi” ¼ “to put”/resulting the meaning “to be put with the other parts”. . It is clear that both Plato’s and Aristotle’s texts (styles and contents) influenced the becoming of systemic perceptions during scholastics and renaissance/q.v. a frequent use of the word “system”, in the Germanic space, during 1604-1613; here are four titles of now anonymous authors: “Logicae systema methodicum”, “Systema problematum theologicorum”, “Logicae systema harmonium”, “Systema systematum”. The Latin “Systema” from the original Greek “sustema” is observable; the vowel “u” being ancient, Greek admitted it as Y/u (capital letter/usual letter) [It is said that Pythagoras used Y as a symbol of the divergent path of vice and virtue.]. But the systemic assumption was only scarcely dealt with during the migration millennium, and the religious and social middle age quarrels. More perceptions of the ever-implicit humankind stronger systemic thinking are something as “Phoenix bird myth”. Our millennium is characterized by a continuous rise of the systemic perception and thinking, starting from the nominalist philosophers/q.v. William Ockham (c.1285-1349) “Summa logicae”/up

Theory of systemic construction 1605

K 34,9/10

1606

.

.

to now. It is critical and repetitive of Plato’s or Aristotle’s frame; that is to cast new elicited knowledge into a implicit systemic frame. Did the same phenomenon occur before Plato and Aristotle? This must be searched for and proved. If it is so, then the systemic perception may be an innate cognitive process (based on some natural mental concept as well). But also it may not be so, if long-term and gradually obtained systemic ideas and mental are involved constructs/q.v. the general type {world-mind} relation, the string plead from (a)/; magellanity property is to be reached by an {world-mind} type traveling rational subject which requires “systemic thinking”. And yet it may be both an innate concepts and an obtained constructs. Magellanity property (M *) is apparently easier to be reached if both innate concepts and obtained constructs are involved. That is why systemic perception is so important to be studied according to all [these three] possible directions [or to some other]. Another very important and strange fact is Aristotle’s treatment of the “part” and the “whole” (Metaphysika,V, 25,VII, 10; and V, 26). At that time this had nothing [?] to do with the use of the word “sustema”. It seems the “sustema” applies only for philosophical “whole” (but it is only a hypothesis). The immediately following treatment of the “curtailment” (V, 27) is an important argument for this article (dealing with non-systemic entities also). A long pre-Aristotelian tradition exists regarding the delimitation and denomination of the “true” and “false” domains. Parmenide, Empedokles, Protagoras, Plato have built a complex “pre-Organon” as a tetravalent logic acting with: science (episteme), true opinion (alethes doxe), and no-science (agnoia), false opinion (pseudes doxe). A contemporary subject of research deals with the Aristotelian “excluded third party (tertium)” principle versus pre-Aristotelian points of view. This subject of research is engaged versus Lukasiewicz’s polyvalent logics. This aim is quasi-equivalent and infra correlated to (3). The last proposition of (3) may be reiterated. [The title “Metaphysika” is non-aristotelian. Andronicos from Rodos (1 sec. BC ) placed this text after those from Physika (“meta ta physika”); what about “sustema” impact and its reusing?]

3.2 The systemic thinking and “General System Theory” idea Humankind promoted systemic thinking implicitly from Aristotle’s “sustema” (at least) till the twentieth century. If the “system” construct was born for Plato’s philosophical entity realization, and if some other philosophical entities as systems were assumed, then the challenge of the assumption of a living being as system (regarding all life support, earthly individuals) was a necessary threshold. But the contradictory dichotic viewpoints: “mechanistic vitalism/organismic” have delayed a modern climax regarding systemic thinking as well as “system” construct. This dichotomy was stated by Julien Offroy de la Mettrie – “The machine human”, 1747/Johann Michael Schmidt – “Treaty about music and soul”, 1754. No prominent complementary viewpoint appeared to try to overpass this contradiction. The turning point was when Ludwig von Bertalanffy’s studied, researched, formulated, and published “General System Theory” (1928-1973) – focused on “An Outline of General System Theory” –

British Journal of Philosophy of Science, 1, 134-164, 1950; then the (double) edition of “General System Theory/Foundation, Development, Application” – at Penguin Books and George Brazillier, 1968. His life nearly spans the twentieth century; of Austrian origin, naturalist emigrated in Canada, professor at Vienna, Ottawa, Los Angeles, Alberta. He was deeply involved both in Charles Morris’ philosophical seminar at Chicago University, 1937 (his general systemic aim being not accepted), and in biophysical research at Mount Sinai Hospital, USA, 1955-1958. His research upon the metabolism and growth conducts him to a theory of open systems comprising notions as: steady states, equi-finality, goal-seeking; equi-finality is a complementary principle (to the classical cause-effect relation) according to which final outcomes can be achieved by starting from different original conditions and along different paths. An organism is an open system, exchanging substance, energy and information (recorded on a substantial or energetic support) with its environment; its elements and processes are ordered so as the entire whole to reach the essential goal: integrity preservation. An organism assimilates according to its surface (frontier), and dissimilates according to its mass(weight). Bertalanffy succeeded in a transfer of notions, ideas, mental constructs from biology toward an abstract description of Reality, of the concrete in this way obtaining a better understanding of it. But then followed a slow, not instant defeat of biological reductionism and of sociological reductionism (biological processes seen as mechanistic physical-chemical as well as social processes seen as a biological). Meantime, Bertalanffy internalized a set of scientific and humanistic works as a promotion of “system” construct history, in spite of its not being emphasized. Thus, he starts his treaty with a list of thrilling Latin adliterated names of: Nicolaus Cusanus (1401-1464), Goffried Leibnitz (1646-1716), Wolfang Goethe (1749-1832), Aldous Huxley (1894-1963), Bertrand Russell (1872-1970); at the end of this quasi-chronological list “S.J. antecesori cosmographi” is inscribed as a (possible) homage to those ancient and medieval comprehensors of the Cosmos, and to the modern descriptors of the astronomic objects – implicitly concerned with a globalist, systemic view (“Cosmos” word/idea was introduced by Pythagoras as the supreme order and harmony). But the full text contains – unchronological ordering – references only for Leibnitz (natural philosophy), Cusanus (coincidence of the opposites), and for Paracelsus (mystic medicine), Vico and Ibn-Khaldun (history vision as cyclic cultural entities sequences or “systems”), Marx and Hegel (dialectics), Herman Hesse (reflected world trajectory as intelligent projected abstract game), Kohler (1924, 1927 – physical gestalten) and Lotke (1925) – the last two considered for their preliminary works to general system theory. All these prove this late nuisance. General system idea has an influence during its some decade promotion or/and was brought as a base for the efforts, usually the reference being the general systems theory. There are (were) scientists as chiefs of research and/or university teams-real contemporary stages corresponding to Plato’s Academia and Aristotle’s Lyceum: (1) R.E. Kalman’s dynamical system theory (mathematical topics) conceptualizing inputs/outputs behavior with “state space”. A state of dynamic system comprises the minimal information required to draw the entire system from a possible additive input to a desired finite output. It is the actual reverse of Leibnitz’s and Newton’s analytic methodology (q.v. V_L of systemic thinking). An inductive construction of a dynamic system versus an experimental set is “the realization problem” (evaluation of a function by an automaton, pattern

Theory of systemic construction 1607

K 34,9/10

1608

recognition, simulation of a tolerance automaton). It is a prominent epistemic bridge between Descartes’ rationalism and Bacon’s experimentalism. But it is also a duality, which supports the reachability, observability, controllability and constructability of complex technical objects (as by time-optimal control closed loop is involved). (2) M.D. Mesarovic’s hierarchical systems theory (coupled with a mathematical theory of coordination) presents the conceptualization, formalization and application domains for multilateral structures (strata, layers, echelons) emerging the coordination problem of the subsystems and the decision-making. This universe of man-made stratified systems was highly comprised after a theoretical accumulation indebted to N. Wiener, L. von Bertalanffy and H. Simon and to their disciples. Also, truly large organizations (industrial enterprises and bureaucracy) were a challenge to the researchers from the 1960 decade to find the path of scientific change of operation and administration. The system – analogically defined – is a relation on the Cartesian product between its (mathematical) objects. New coordination methods are assessed according to the so-called “balance” and “estimation” interaction principles, “interaction decoupling” methodology being enrolled. (3) G.J. Klir’s epistemological hierarchy of systems categories builds the system construct as a mathematical object – a relation between abstract entities. It is qualified as a model of some feature of the Reality (natural, social, human-made parts) if and only if there exists a “homorphism” related to the mapping of the entities from Reality and the respective entities from the system construct. So, a “thinghood” and a “systemhood” may be homomorphically interrelated if a rational subject gradually rises in Klir’s epistemological hierarchy (! and climbs the corresponding deepness of the Reality simultaneously!). Let there be a medium-support (time, space or a population). There exists a bottom experimental frame or “source system”; when actual data are available (as a description language assures) a “data system” is reached; when the relation between the variables (assigned to the data) is invariant in relation to the initial media-support the “behavior system” is reached. Let there be two principles operating as rising inside epistemological hierarchy, and resulting systems integration as larger systems: . If two or more behavior systems (or respective data or experimental systems) have common variables or interact then they are integrated as an overall “structure system”. . If there is an invariant procedure of replacement from a system to another system (both on the same horizontal epistemological layer), and if this invariance is in accordance with the initial medium-support, then those “replaced” systems are integrated as an overall “metasystem”. Both structure systems and metasystems may produce a second, third, a.s.o. order integration for themselves as a similar superior order interselves. So, a epistemological hierarchy is revealed. In these terms, the systems science is not another science but a metamethodology, a new scientific dimension for abstract knowledge structuring. The abstract knowledge may be simulated, the artificial life technique being evolved. So, if a rational subject is rising inside epistemological

hierarchy, then it can realize a string of progressive insights (as systems) from bottom toward its layer (and never higher). Kalman’s, Mesarovic’s, and Klir’s personal ideas and their teams publications were a intense indirect support for a general systems theory (systems – not system !), during the 1960-decade. It was a mathematics major approach (characterized by a diversity of trends), but simultaneously very explicit as foundations and applicability. It was wide spread, its “propagation wave” has passed the 1990-decade as a necessary stage anywhere. There were consequent researches from: (1) Domain as the robust multivariable control, parallel processing (analogue, digital and hybrid-microelectronics superimposed digital processing), non-linear process control, interactive learning and adaptive control for language scale systems. H_controller synthesis and system identification will promote general system as automaton. (2) Domain as an interactive decision stratum for the multilevel and hierarchical world model; with the world seen as ten regions with countries grouped according to their economical, social, political and psychological similarities. This “Strategy for Survival” (intended as a computer-based planning and decision-making tool) is one of the famous predictive world models at the end of twentieth century. (3) Domain corroborated with the infra specialization inside logics; to mention only two researches (very close to general system idea): R. Mattessich’s “Theory and its Correspondence to Reality” – {world-mind} relation is very indebted to his work from 1990; and R.Valle´e’s “Epistemo-Praxiology” – a “well temperated constructivism” – 1995, which links objectivity and subjectivity via multidimensional perception, decision and action; as a balanced mathematics and philosophic approach it can assure one of the incompatibility between system science and any mystery search. A. Newell, J.C. Shaw and H. Simon concretized a long-term dream by a program that simulates human thought: General Problem Solver (1958-1963). This creative effort is parallel to general system(s) theory – but it is possible that some common mental concepts emerged through both these two directions. General problem solver is the real initial implosion-point for a long string of efforts; these are denominated as Artificial Intelligence. One of the earlier proofs that both directions are overlapped is that Mesarovic and Klir dealt with human reasoning. Artificial Intelligence is the actual base of the expert systems, which are a real challenge for automation (1) domain (so, another particular argument for the overlapping of the two directions). The necessity of specific industrial control and the (attractive field and specific versatility of human reasoning researches in accordance with) pattern recognition are the premises of neural network domain. Surpassing a kind of unpractical initial enthusiasm, aggregating some major scientific advances, the neural network is also a long-term human research: F. Rosenblatt’s perception (1958) and J.Hopfield’s “Neural” computative of decision (1985) are classical topics today. PINK generation of artifacts (psychology, intelligence, neural, knowledge) is a present and future world. Expert systems and neural networks are seen sometimes as

Theory of systemic construction 1609

K 34,9/10

1610

successive and as a continuum too. However, these two directions have an independent future versus their systemic implementation. It is not a common fact that systems theory and cybernetics, computer science, artificial intelligence, neural network, operational research, microelectronics, communication have and will have “fascicular” development. Their long-term “inside this fasciculum isolation” reduces respective possible interdisciplinary efforts. In spite of this, management science, cognitive science, philosophy of mind, bioengineering, bioeconomics, cyberspace development and ecology, are very indebted to the entire fasciculum mentioned above and also to general system theory ideas. Will they continue to be so indebted? On the other side, there are institutional long-term attempts for systemic deep interdisciplinary insight (the list is ordered under the alphabetic appearance of the host country): . International Institute of Applied System Analysis, Laxenburg, Austria. . Center for Hyperincursion and Anticipation in Ordered Systems from the Institute of mathematics, University of Liege, Belgium/Daniel D. Dubois. . International Institute for Advanced Studies in Systems, Windsor, Canada/E.G. Lasker (Synergism and Sociopolitical development). . Institute of Interdisciplinary Studies (teaching and research: language and cognition; representation and learning)/Carleton University, CA. . Trent University, Peterborough (Applications of modeling in the natural and social sciences (quantitative modeling; cross-disciplinary communication within one discipline training), CA. . International Research and Transdisciplinary Studies Centre, Paris, France/B. Nicolescu – Transhumanism. . Center for Synergetics from the Institute for Theoretical Physics I, Center for Synergetics, University of Stuttgart, Germany/Herman P.J. Haken. . Decision Support Network for Strategy Development and Problem-Solving, Germany. . International Institute for Advanced Studies, Kansai Science City, Japan. . Center for the Study of Social Stratification and Inequality, Graduate School of Arts and Letters, Tohoku University, Sedai, Japan. . Institute of the World Organisation of Systems and Cybernetics, and University of Wales, Bangor, UK/Brian H. Rudall; World Organisation of Systems and Cybernetics is foundational indebted – also – to Robert Valle´e, France. . Institute of Cybernetics, Brunel University, Middlesex, UK and University of Central Lancashire, UK/James N. Rose. . Santa Fe Complex Adaptive Systems Institute, USA. . Global Development Network, Washington, USA. . Institute for the study of Complex Systems, Palo Alto, U.S.A./P. Corning (Synergism self-organization dichotomy). . Academy of Transdisciplinary Education and Research, Mechanical Engineering Department, Texas Tech University, USA.

.

.

Cognitive Science Faculty (research areas: language; representation, reasoning and learning; vision and action)/University of Rochester, USA. Center for Language and Speech Processing (interferes with biomedical engineering; cognitive science, computer science, electrical and computer engineering, mathematical sciences, psychology)/Johns Hopkins University, Baltimore, USA.

Another side consists from: . World Organisation of Systems and Cybernetics is/a network of nearly 1,000 scientist; more than 30 systemic intra_entities; world congresses from three to three years (the last: Pittsburg, March 2002). . International Sociological Association/a network of nearly 4,000 scientist; more than 70 systemic intra_entities; world congresses from four to five years (the last: Brisbane, July 2002). . International Union of Anthropological and Ethnological Sciences/a network of nearly 5,000 scientist; more than systemic 50 intra_entities; world congresses from five to five years (the last: Florence, July 2003). . International Simulation and Gaming Association – annual conferences, from 1970; related methodologies: computerized simulation, policy exercises, role-play, experiential exercises, play, structured experiences, game theory, operational gaming. On the other hand, sociocybernetics is a firmly, explicitly indebted domain to twentieth century fundamentals and a strong interdisciplinary contemporary approach too. It is possible for sociocybernetic background to be a contemporary efficient domain to rise a reconstruction of general system theory idea. That would be possible through an interdisciplinary long-term open debate among research teams, virtually represented by (the list is ordered under the alphabetic appearance of scientist’s name): . H.W. Ahlemeyer, O. van Nieuwenhuijze, T.Perez de Guzman – complexity; . K.D. Bailey – social system entropy; . E. Barbieri Massini, F.Geyer – future and social science; . B. Buchanan – assessing human values; . T. Burns – socio-cultural systems; . T. Devezas – techonospheres; . B. Hornung – integration , society; . L. Langman, F. Geyer – alienation; . J. Mingers, P.Barbesino – autopoiesis; . F. Parra-Luna – society , axiology; . N. Romm – responsibility; . B. Scott, G.M.C.I. Boyd, A.V. Jdanko – education. . P. Stokes, M.G. Terpstra, P. Nicolopoulos, H.J.L. Voets – organization; . P. Stokes, M.G. Terpstra, P. Nicolopoulos, H.J.L. Voets – organization; and

Theory of systemic construction 1611

K 34,9/10

1612

.

J. van der Zouwen, C. van Dijkum, D.J. DeTombe, I. Kratli, R.L. Henshel – sociocybernetic methodology

The last (but not least, subjective) systemic and interdisciplinary insight ensued here is represented by past and renewed alternative approaches: . V. Turchin, C. Joslyn, F. Heylighen – Principia Cybernetica Project (computer-supported cooperative development of an evolutionary-systemic Philosophy); . F. Heylighen, and M. Vaneechoutte – memetics (concepts, evolutionary mechanisms, computers and networks, social sciences); . N.C. Callaos – CD-ROM extended encyclopedia of systemics, informatics and cybernetics; . A. Behrooz – Knowledge Transfer annual international conferences, at the University of London from 1996/from 1999 in Romania; . A large and foundational insight was supported by Professor Stafford Beer and Dr. Heinz von Foerster; . A variety of advanced studies is supported by G. Andonian, E. Andreewsky, AM. Andrew, R. Bartley, M. Belis, M.L. Best, J.M. Bishop, W. Buchley, A. Carron, J.L.R. Chandler, H-F. Chen, Y. Cherrault, T.N. Clark, P. Constantinescu, A.B. Engel, M. Cruz, V. Dimitrov, D. Dubois, J. Evers, E. Engdall, V. Fomichov, A. Garcia-Olivarez, I. Gilbert, A. Gosal, C. Greiner, R.W. Grubbstro˝m, L. Guoyang, X. Guangcheng, J. Hiller, A. Irwin, G. Jasso, H. Katz, A. Kjellman, B. Kochel, T. Koizumi, I. Krattli, V.I. Kvitash, Y.T. Leong, E.T. Lee, E. Lleras, M. Malitza, P. Marsden, M.E. Martinez, P. Masani, G. Marshall, M. Manescu, L. Medek, H. Miki, S. Milcu, D. Moerenhout, D. Murphy, C.V. Negoita, Ed. Nicolau, G.M. Nielsen, S. Odobleja, M. Oussalah, W. van Oorschot, Th. Quinn, P.J. Querinjean, J. Radice, B.N. Rossiter, D.O. Rudin, H. Sabelli, V. Sahleanu, S. Santoli, A.R. Scuschny, E. Schwarz, M.C.B. Smith, D. Steeg, D.J. Stewart, A. Sugerman, E. Rynen, L. Tao, R.J. Taormina, C.C. Valentino, R. Vulcanescu, J. Wood, B.Warburton, B. Zeeberg – all of them being systemist inside their own domain but the interdisciplinary aim is ever present. All these “other sides” are not the only direct effects of the long-term Bertalanffy’s idea; it is a “propagation wave” inward its cognitive effect. There is a heterogeneous generation devoted to systemic thinking. This includes the Stockholm International Peace Research Institute (after G. Myrdal), and the Club from Rome (Prince Hassan Bin Talal, A. Peccei, A. King. . .). These seems to be a variety of paradigms resulting from the complexity of systemic world. Their aggregation is not a stimulus for an attempt toward a reconstruction of a general system theory. Returning to Bertalanffy’s climax, the following question appears today: did he reach a magellanity property according to his travel through (world-mind)? No. This is indeed to be expected, but of another stage, that of epistemic thinking confidence. Did (a), (b), (c), and their successors and their parallel challengers reach a magellanity property {world-mind} traveling? Nearly yes, but strictly for each domain (seen not as a finite aggregation of subdomains), and even a contemporary refinement is to be on the role. That is only “nearly yes” above. But it seems that there

are: an {world-mind} travel for natural, an {world-mind} travel for social, and an {world-mind} travel for human-made systems [so, three different varieties]. And for each of them the infraseparability is a question in itself. That is why, at a turning point, “General System Theory” was overwhelmed by “General Systems Theory”, during the 1970 decade. Is there a confidence in a variety of systemic thinking? That is the problem in the next paragraph. [But “General System Theories” for M. Bunge – a promoter of a “Yes”-1977 answer for the last previous question.] Figure 3 shows an iconic solution to all these according to a human (supposed) trans-reflexivity. Inside this figure this property is gradually iconic and implicitly defined; the symbol U is inspired from a text of Unamuno (during a conversation between two persons, there are six persons: the original two, two reflexive images of their selves, and an other two by which the two persons represent each other.). Gradual task for Unamuno property is depicted as (U) – for each separated domain/nature, artefact, and social/, (U *) for ideas domain, and (U * *) for the general system idea.

Theory of systemic construction 1613

4. Systemic thinking and its varieties The human psychic system is a concomitant objective element (objective element) on the physic, biologic, social and cultural levels. Human psychic system is a real

Figure 3. Multi/inter/transreflexivity (U/U */U * *) Unamuno property

K 34,9/10

1614

“different universe” dedicated to the reflection of the primary universe and to the reflection of the reflection. The possibility of the both reflections is provided by: the hyper complexity of 24 milliards neurons (each maintains 10 millions branched connections with the others), the hierarchical structure of this fabulous amount of neurons – an enchanting carpet, the selforganizing mechanisms, the interaction of the sensorial cognitive self regulated processes. The consciousness phenomenon controls the set of psychic functions (also the creative function); the emergence of the consciousness beings upon the self’s existence. A strange bipolarity of the human psychic system is possibly indebted to the interference between the conscious stratum (aware ego and its consciousness) and the (pre)underconscious and, respectively, the (trans)unconscious stratum. The thinking is the central cognitive process of the human psychic system and the implicit denomination of the rational subject. The contents of the (mental) concept, the statement, and the reasoning emerge through superior analysis and synthesis, coordinated abstractization and generalization. The interference of the heuristics with the algorithms, the cognitive learning with the semantic decoding constructs the representation and the solving problem. Both these constructs appear as a necessary peak of thinking, and also as an ever “visible face” of the intelligence, and in fact of human psychic system. To think in a systemic way means to: . necessitate the operation with the world of the exterior disposed to self systems, . maintain the systems’ world by (self)solving the problems sufficiently (these are adjacently disposed to self-constituted problems), . sufficiently necessitate the mastering of the surrounding systemic complexity through an active human-technical presence (i.e. the tendency toward magellanity). All these reveal certain steps in the comprehension, through human psychic system, of another system. It is not very easy to use a hypersystemic tool upon a systemic context or to force paradoxes and to surpass limitations “or not to be”. 4.1 Minding upon the systemic thinking The synthetic representation of the systemic thinking universe is a certain performance. Here, there are to be remembered: . J.P. Guilford’s cube (1957): human psychic system capacities as the 120 corresponding elements to Cartesian product – operations £ contents £ products (systems inclusively) as 5 £ 4 £ 6 divisions; . R. Thom’s typology of explanatory moods upon the Reality (1987): global/local entities (e.g. local entities are global treated by general systems theory, and local treated by catastrophe theory/q.v., Table I).

Table I.

Treatment Entities

Global

Local

Global Local

Theology; metaphysics General systems theory; dynamics

Universe of objects and of analytic enlongment Language analysis; theory of catastrophes

.

F.S. Albus’ theory of intelligence/outline (1991): state trajectory from four modules (sentry processing, would modeling, behavior generation, value judgement) on seven hierarchical layers.

Here is Solomon Marcus’ conception of understanding moods upon the Reality (1990). There are four moods to be located in relation with to a representational space generated as 2D (this being the original starting point regarding this conception – as a systematization): (1) reflexive/empiric dichotomy; and (2) discursive/intuitive dichotomy.

Theory of systemic construction 1615

This two dichotomoic dimensions are very closed to the complementary revealed significance through: generative theory (N. Chomsky), genetic epistemology (J. Piaget), respectively, right/left brain hemisphere specialization. Table II presents four denominations of the modes of understanding the real and an outstanding philosopher for each of the four (analytical, holistic, experimental – directly denominated within a problematic context; experiential – refers to the natural, spontaneous facts, expected and received for some time, then interpreted). Reflexive/empiric dichotomy comprises: infinite/finite interaction, and competence/performance duality. Discursive/intuitive dichotomy comprises: logic/infra “logic” intellectual strategies, secvential/non-secvential. This dichotomies are not the final ones, e.g. conscious-unconscious. It is evident that the dichotomic analysis realized through moods of explanation and understanding of Reality is an heuristics – but used only (as a tool) to elicit what is essential from an extraordinary variety of our human thinking and artificial intelligence reasoning (i.e. cognitive modes). This tool is necessary to identify a basic variety of systemic thinking. As it follows, more than one mental construct is necessary – for one to overpass “systemic” boundary responsibility. 4.2 Responsibility and its (meta)indicator understanding moods The above described systematization upon (cognitive moods) is a cognitive holistic tool for humankind research and a pattern of reflection upon reflection. It is a four-direction possibility of understanding mixed system set of indicators. This set is presented in Figure 4. Actional, informational, and decisional sub-systems of a mixed system have their own equilibrium indicators. The equilibrium of the entire mixed system aggregates all the three indicators of these three sub-systems. The result is an Eq_mixed system overall indicator. But as the hypercomplexity of human psychic system exits, a complexity of mixed system also exits. At least, human psychic system hypercomplexity induces a complexity upon any human-technical aggregation of resources. This complexity always retains a rest of facts inside mixed system. So, this

Cognitive modes

Discursive

Intuitive

Reflexive Empirical

Analytic, Decartes Experimental, Bacon

Holistic, Platon Experiential, Bergson

Table II.

K 34,9/10

1616

Figure 4. Mixed system set of indicators

rest of facts is not comprised by Eq_mixed system (the magellanity regarding comprised and not comprised facts to be expected). Thus, this rest is to be understood as mixed system evolution/security interference-denominated as responsibility and located through a responsibility zone, in Figure 4. Responsibility zone is a long-term projection of an overall goal concerning mixed system connectedness on human and technical resources both inside and outside mixed system. An autocratic (only innerly caused) evolution of a mixed system occurs very rarely, and an exclusive inner solving of mixed system security would be often desirable but less probable (sometimes but not always, inner solving being a totalitarian feature). So a meta_indicator would be proper to denominate this “inter and trans bordering” feature of the responsibility for

mixed system: m_Eq_mixed system to be its denomination. The couple (m_Eq_mixed system, Eq_mixed system) is not similarly seen according to the four varieties of the moods of thinking, and is interpreted as systemic thinking varieties. Table III presents all these. Each variety (V) of systemic thinking – inside mixed system – is to be associated with a systemic vision [or definition, or assumption, or becoming/but the denomination “vision” follows]: Spinoza; Russel The metaequilibrium of our system is an external matter for us. God’s features exist according to all possibilities. We can reveal it logically. (V_S;R);

Theory of systemic construction 1617

Goethe Our nature (the major system) is simple “The world could not last if it were not so simple”. Its stability is morphogenetic assured. (V_G); Leibnitz The incompossibility of our world is very severe, not all possibilities exists. There is confidence in the local equilibrium, acquired by the construction induced from the starting locus to an appropriate outside, a.s.o. (V_L); Cusanus Any coincidence of previous oppositions may occurs; any couple is to be preserved through ignorant consciousness. The world is just a ludic act (game). (V_C). And also each variety of systemic thinking (associated with a cognitive mode, and a kind of determinism) is to be provided by understanding the Reality (Table III). 4.3 Systemic entities and other entities Focusing on the above descripted four varieties of systemic thinking, an aggregation of resources is to be observed on more and more significant levels. A structure and a functionality of this aggregate – an objective element. A rational subject, which is coupling itself with this aggregate. So, an entity is realized (according to Section 2) as an aggregation of the information proper to the resources from the inferior beasing of structure, step by step, till the superior beasing, bottom up. The functionality of this entity also comprises this aggregation of information (and is a part of this functionality as well).

Cognitive modes Varieties (V_) of systemic thinking Reflexive

Empirical

Discursive

Intuitive

Analitic V_Spinoza;Russell m_Eq_Sm as EXT Probabilistic determinism Dynamic determinism V_Leibnitz Non ’ m_Eq_SM Experimental

Holistic V_Goethe m_Eq_SM ¼ Eq_SM Structural selective determinism Heuristics/mixt determinism V_Cusanus (Eq_SM; m_Eq_SM) Experiential

Table III.

K 34,9/10

1618

An expandation of the upper decisions from the superior stratum of the same structure as above, similarly step by step, but top down. A rational subject, acting V_G accordingly, states a coherent (to all the above described) scale of values – dedicated to the expected Eq and m_Eq (estimated/measured). Each realization of the entity (objective element, rational subject) may have its corresponding two scaled positions according to its Eq and M_equation All those bottom up and top down informational processes are a not completely objective information status; i.e. a difficult task for rational subject/V_G to act toward this scale of values. This difficulty is a paradox according to V_G vision. In this context, a relevant question is posed. Is there a methodology to assure the emergent proving of the mixed system existence as system (according to Sub-section 2.1) through the realizations of the (objective element, rational subject) entity? Let be as an answer – the following steps: (1) Let be a rational subject, acting according to V_C variety of systemic thinking. If it is successful to estimate/measure an Eq value, then a m_Eq value – in spite of its V_L vision (it is another paradox, but according to V_L vision); (2) Let be a rational subject, acting according to V_S; L variety of systemic thinking. If to be successful for and its results to be a m_Eq value, then an Eq value. (3) Let be a rational subject, V_C acting. It is substracing the (m_Eq;Eq) couple of values (2) – rational subject/V_L from the relative couple (I) rational subject/V_S;R using rational subject/V_G scale. The result of this subtracting procedure to be noted as E – the module, ever positive, related to “Entity” mental construct. (4) If jEj is not significant (according to a V_G observation for significance assessment), then mixed system denominated entity is a systemic one. (5) If jEj is extremely significant, then there are unbounded/not even aggregated resources. (6) If jEj is between (4) and, respective, (5) case, then, according to V_G scale and observation of no significance and V_S;R, V_L, V_C acting, a world of systemhood and individualhood is revealed. Systemic entities are located only at extremity of the world of systemhood and individualhood, at the other extremity existing only resources which are isolated objective elements. The core of this world of systemhood and individualhood is a non-systemic entities stratum. This stratum is not “directly visible”. Only through varieties of systemic thinking cooperation this non-systemic stratum can be reveled. So, each “official” system may be a non-systemic entity or a systemic entity. Are all these a virtual effect of the above methodology? The following very short sub-section deals with this question. 4.4 Non-systemic concepts sources Let be an exclusively triadic-itemization bibliographic base. It is an evident constrain according at least to Bertalanffy’s idea which was influential and was brought as a

base for that long (but subjective) enumeration from the last part of Sub-section 3.2. This construction is an operative cognitive one, regarding a specific re-focus upon world of systemhood and individualhood. It is the context for acquaintance with non-systemic concepts. The triadic-itemization consists of: . Buckminster R. Fuller – utopia or oblivion: the prospects for humanity; cap.11. Design strategy; . K.D. Bailey – system and conflict: toward a symbiotic reconciliation; . F. Parra-Luna – The notion of system as a conceptual bridge between the sociology of organizations and organizational efficiency. The adequate interference between these three works, and Romanian systemic and philosophic researches have elicited the following axioms and propositions: Axiom 1. There is “our” cosmos inside the universe. Axiom 2. Human (rational subject) has a teleological existence inside cosmos. [B.R. Fuller, 1927 – ephemeralization] Cosmos has a teleological existence containing the [(H. Wald, A. Dumitriu, St. Milcu; 1976, 1990, 1995) significant, universalis, bioethical] human (rational subject). Axiom 30 . Cosmos , teleological existence , human (rational subject) Teleological existence , (local) life support systems Cosmos , (local) life support systems , (self)organizing existence. Axiom 3. (Figure 1). . . (world-mind), (M *), (U), (U *). . .There exists a divisibility inside the matter-reflection relation. There is a triadic matter – ego, and for each side multipoles outlook existing. So, there is a triadic matter (substance, energy, information) [V. Sa˜hleanu, 1973 – Informational biology] and a multipolarity from its properties (generability, becoming, variability, universal connection). There is a triadic reflection (information-reflection element, information-model, objective information). There is a plurality of the determinism (dynamic; probabilistic; heuristic). It is a continual becoming of the matter-reflection duplex relation. Consequence 1. Ontic-gnosic/episthemic ring is an outlook of this duplex essence. Consequence 2. Life is another outlook of the same duplex essence. Proposition 1. There is a variety of life support systems. Proposition 2. Endoreproductibility, cognition and organizing are the main features to delimitate natural, artificial and social life. These features are the turning points of an open triadic matter. Axiom 30 . Q.v. between axioms 2 and 3 Axiom 300 . (Figure 2) Systemic space , profound zone Systemic space (environmental and inner spaces j versus entity’s “point of view”) Entity (rational subject , objective element j versus systemic tension) Systemic tension , world of systemhood and individualhood Magellanity ¼ rational subject’ attempt to world of systemhood and individualhood [{world-mind} expressed, 1521/Magellan] Trans – reflexivity ¼ entity’s attempt to world of systemhood and individualhood [(U) expressed, 1936/Miguel de Unamuno]

Theory of systemic construction 1619

K 34,9/10

1620

Organizational efficiency ¼ system’s (spirituality (rational subject , Entity , System)) attempt to world of systemhood and individualhood [(P-L) expressed, 1975/Francisco Parra-Luna] Attaining of general system attribute (idea) ¼ matter- reflection relation as system tension (system idea). Axiom 300 world of systemhood and individualhood , our world , [“our”] cosmos [ , universe] , world of systemhood and individualhood. Consequence 3. Understanding and explanation of our world are the correct common expressions for magellanity, trans-reflexivity, organizational efficiency, and general system attribute on World of Systemhood and Individualhood. Axiom 4. Only rational subject explanation/understanding upon the system exists. [S. Guiasu, M. Belis, 1968 – quantitative-qualitative measure of information] Axiom 40 . (M *) circularity upon {world-mind} assures (U *) virtual identity upon (U) – multi-reflexivity between primal existence, dual model, and model of the model. [Axiom 40 is an equivalent inner definition of (U) property.] Axiom 400 . (M *) and (U *) assures spirituality – q.v. Axiom 300 . Axiom 400 . (M *) and (U *) assures (P-L) – q.v. Axiom 3’’ q.v. Axiom 2. [St. Odobleja, 1938-1978 – resonance logic/as a sole natural; R.Vulca˜nescu, 1994 – artefact logic (consequent with myth logic)/at least one possible realization; L.Culda – organizational approach to knowledge/non-”genetics harmony” genesis and becoming; G. Anca, D.Petrescu, 1992/3 – Non-evolutive systemic paradox]. Consequence 4. There is not a linearly structured, but a net structured set of Axioms 1-4; this fact reflects the net-type matter-reflection duplex relation. Axiom 5. (M *), (U *), and (P-L) assures (U * *) – q.v. Axiom 3. [C.Portelli, 1992 – informational dialectics of the nature/as the source of “our” matter-reflection duplex net-type relation being the transcendental information and its cycle] Consequence 5. During (U * *) transcendental cycle based upon transcendental information, a wide range of systemic thinking, systemic varieties, and systemic and non-systemic entities are supported. Consequence 6. It is possible to “understand”(only one) transcendental cycle; there is a stationarity of the spirituality of (rational subject , Entity , “(non)”-System); all of these being in the singular form as (our?) linguistics has termed – but trans-supported; the realizations of each entity and of our chronological observation of the entities being in the plural as (our!) linguistics has termed – and it received our consent for it (as life supports systems). Axiom 50 . The Universe is made of the universal background (or ether), associated with energy quanta, information quanta, and their “strings” and “cords”. These basic entities generate space, time, substance and field. These are built according to a Fibonacci – like rule and to the equivalence for “filling” the Euclidean space only with the help of 4, 6, 8, 12- edron faces theorem of antiquity and with the help of 32 hyperfaces in the Minkowsky 4D space (inside the fourth dimension the time is generated), and for “open” model faces for further generation with more than four dimension. [P. Constantinescu, 1982 – Philosophy of systems].

.

Theory of systemic construction 1621

Axiom 6. [Sub-section 4.3] The world of systemhood and individualhood (q.v. Axiom 300 ) has a discrete-type structure. Consequence 7. There are other types (along with the systemic type) inside world of systemhood and individualhood: metasystem, network, transitron, . . . , till individicity. Consequence 8. The remainder is the turning point of all these types according to the long-term functionality each above denominated type. The remainder presents or suppresses the functional emergence of X (verb, systemic property, world of systemhood and individualhood property):

potentially suitable for each of the above denominated type through their concrete realizations inside our world. Definition 1. A realization of a world of systemhood and individualhood entity has A-functional emergence if it intelligently preserves the variability of the corresponding matter or/and a triadic proper reflection (Figure 1 and Axiom 3). Definition 2. A realization of a world of systemhood and individualhood entity has B-functional emergence if the integration of its elements exists. Definition 3. A realization of a world of systemhood and individualhood entity has C-functional emergence if its elements are doing progressively more with less per each, versus a divisibility of the matter-reflection relation. Proposition 3. A, B and C functional emergence of a realization of a world of systemhood and individualhood entity expresses anti-lethargy, and anti-obsolesce, spurs a necessary change and a growth/(quasi) stationarity/evicted decline. Proposition 4. A-intelligent preservation of heterogeneity implements both monotony/harmony and harmony/dissonance.

K 34,9/10

1622

Proposition 5. B-emergence may vary between true integration and conflict (even dissolution). Proposition 6. An entity may C-do something from consensus till conflict (Figure 2 and Axiom 300 ). Proposition 7. Definitions 1-3, and Propositions 4-6 must be explained inside the Proposition 1 context, and understood inside the Proposition 2 context. Axiom 7. [Figure 3] The types of approaches to Reality are in accordance with the multi/inter/trans-reflexivity (U/U */U * *); U being the denomination for Unamuno property (q.v. Axiom 40 – inner definition). Axiom 8. Systemic – individual feature (as systemic tension – q.v. Axiom 3’’) is another general property of the matter but only in conjunction with reflection (q.v. Figure 1, Consequence 1 – explanation and understanding). .

Proposition 8. The variety of life support systems must be explained inside the Axiom 80 context, and understood inside the following context:

Proposition 9. Our world of systems and things emerges from: . welfare toward poverty; . from happiness toward alienation; . from micro/macro-cosmic scientific advance to heterogeneous types of: contemporaneous countries, metropolis of the 3rd millennium, the realizations of the human being and its personality. Remark 1. human game (1946)/cybernetics; personality (1972) j FutureScapee (1998) Karl Gross/C. Ba˜la˜ceanu, Ed. Nicolau j Irene Sanders knowledge; self; brain (1933-1977)/systems genesis (1985) j cosmic sentiment of existence (1990) Karl Popper/Paul Constantinescu j Mihai Dra˜ga˜nescu soul/machine j general system

Schmidt/Le Mettrie j von Bertalanffy idea/matter j percipient mind Plato/Aristotle j Berkeley(1685-1753) number/flame j atom Pythagoras/Heraclitus – “weeping” philosopher j Democritus “laughing” philosopher (c.550-c.480 BC ) (c.460-370 BC ) ^ Leucip (c.460-370 BC ) k k chronok direction k of reconstruction GSI: General system idea reconstruction as representation and re-solving dichotomies; “self-recursive” systemic tension (GSI). Remark 2. Human made artifacts, and social domains refer to rational subject, but inside the natural domain, the deepness of [“our”] cosmos inside the universe (q.v. Axiom 1) may contain “other rational subject” as matter endowed with its own intelligence [D. Constantin Dulcan, 1987-Intelligence Matter]. World of systemhood and individualhood would have another expandation.[q.v. Raymond Ruyer-Princeton Gnozis] Remark 3. No rational subject can surpass: (1) the omnidirectional falling into infinite and/or paradoxes of the parametrical set of desirable sensitive observations; (2) the drama of operative thresholds and limitations confronted with an ideal (re)conceptualization; (3) the falling into infinite of any generated – generator reservoirs – close loop; (4) the unperceived (re)becoming of life from an infinite absorbent heterogeneity. Confronted with all these limitations, any rational subject may develop a systemic and non-systemic dual frame for selfguidance inside deep Reality. This means: . (M *) inside a bordering finite domain of desirable sensitive observations; . (U *) according to continuous cyclic reconceptualization inside an operative epistemic context; . (P-L) according to a set of: artifacts, social limited zones, research focused upon a small or medium scaled area; and . “expanding” concept of spirituality from the world of systemhood and individualhood base of entities which are referred to in the singular. A duality exists inside Remark 3. Proposition 10. If the realizations of the world of systemhood and individualhood entities are [(M *) accordingly] convergent to the whole Reality, then the duality existing inside the Remark 3, and (U * *) are go: . systemic non-consistence (contradiction) but completeness for each domain as: finite/operative/specific set/particular realization of the “expanding” concept of spirituality;

Theory of systemic construction 1623

K 34,9/10

.

.

.

1624

non-systemic consistence and completeness for each of the above-mentioned domains (non emerging from an equivalent Go¨del theorem frame); existent systemic fuzzy consistence and completeness for each domain (emerging from an non-equivalent Go¨del theorem frame and Arrow theorem frame); and non-systemic consistence and completeness for each domain (also emerging from an non-equivalent Go¨del theorem frame, and Arrow theorem frame, and impossibility indicator aggregation frame – G. Pa˜un, 1982).

Proposition 10 must firmly demonstrate (and then must help us with a common evident magellanity sense) that systemic and non-systemic entities are strongly related to our observation, reflection (of reflection), efficiency ideal, and explain to make us understand that systemic tension is related to the general system idea as a reconstruction. Also, there are more arguments to sustain that both our systemic tension and ourselves (the finite [?] set of rational subject; living support systems) are a general property of the matter. 4.5 A note upon “holos” and “system”; “integron” An ideal hope is to find the proofs for (all types of) systemic ancient thinking patterns which become towards contemporaneous customs for science and life. It is considered upon “individual-transitron-network-metasystem-system ¼ ¼ holos j versus reminder” mental reflection. Out of the domain of this section, it would be possible to mind as an rational subject upon an objective element without any human resources or “rationality” (an atom, a galaxy). So, it is considered as necessary the comparisons between (at least): Socrates, Hypocrites from Kos, Anaxagoras/Empedocles, Aristip from Cyrene, Democrit/Leucip, Plato, Aristotle (as some epistemic polarities may be “systemic” proved), Epicur and Lucretius (an other, successive pole); the critical “moment” of Hypatia; and all their trajectory till W. Ockam, and then till M. Montaigne, J. Huarte, L. Valla, . . . , P. Gassandi; Th. Moore’s and A. Tennysson’s reflections; G.G. Byron, P.B. Shelley/ M. (Godwin) Shelley’s thoughts; Kant’s deepness; A. Smith duality, “late” J.Ch. Smuts’ Holism versus L. von Bertalanffy’s General System Theory. A terminological “unification” for system 5 5 holos may be integron. But, some long-term projective linguistic experiments are necessary (not to be elicited an other “system-holos” divergence). 5. The transition of systemic and non-systemic realizations of entities from the world of systemhood and individualhood Let be A, B, C-functional emergencies of a realization of an entity marked with A, B, C,. . . according to Consequence 8, and respective with the Consequence 7 types marked as: mixed system (mixed system/human-technical); m_mixed system (meta systems); n_mixed system (network); t_mixed system (transition); ind (individicity). Also to be marked: remainder as r/involved into -r- . (the transition from a type of world of systemhood and individualhood entity or other one); uncoordinated mixed system as m_mixed system; resources with external (quasi) aggregation as re(q)a. Table IV shows a complete set of transitions with world of systemhood and individualhood types.

Type of the source entity

Deficient

MS MS MS MS m_MS m_MS m_MS m_MS n_MS n_MS n_MS n_MS t_MS t_MS t_MS ind

B A C A, B, C B B B, C A, B, C A, B, C A A A C C, A C, A, B

Emergence (quasi) persistent C C A [non-re(q)a] A, C C [A] [non-re(q)a] [non-re(q)a] C [,B] C, B [C] A, B [non-re(q)a] A, B, C

–r – > Transition to m_MS n_MS t_MS Ind uc_MS m_MS n_MS T Ind Ind m_MS r/d_MS, n_MS MS, t_MS Ind re(q)a MS

Note: A, B, C denominations are to be seen inside sub-section 4.4. – Consequence 8

6. A measurement technique to distinguish the systemic from the non-systemic entities The final utility of such a technique is not to offer a quite magellanity advice for a manager but to reduce the risk of applying any traditional systemic method to non-systemic entity. Obviously, the hard core will be to abilitate new specific methods to each non-systemic identified variety from our living support systems world. So, this section only presents an identification technique. This technique is related with mathematical linguistics – as two problems of identification from two linguistic contexts were related to two supposed systemic, and respective, non-systemic entities. The first problem (P1): Which from the systemic and the non-systemic is the proper qualifying term for Mihai Eminescu’s antume poetry (1850-1889)? The graphical solution is presented as Figure 6. The second problem (P2). Similar with P1 in a scientific work (analysis and synthesis of decisional operative sub-systems, 1995)? Figure 5 presents the solution. The identification technique (verified upon P1 and P2) consists of a representation of the (linguistic) problem, and a graphical solution of it; then the interpretation of the solution eliciting the answer: systemic or non-related to the context (domain) of the problem. 6.1 Representation (1) The usage of a word-frequency pattern of the natural language proper to the text of the problem (i.e. words frequency ¼ function (words ordinal number), word frequency from “Frequency Dictionary of Rumanian Words”, Mouton &

Theory of systemic construction 1625

Table IV.

K 34,9/10

1626

(2)

(3)

(4)

(5)

Co, 1965; NJ – the number that reflects the natural ascending order versus the decrease in word frequency). The usage of a notion/word frequency pattern for the (con)text of the problem P1, and for P2 – for P1 related to “About Self and Sense inside Eminescu’s Utterance” (in Romanian), Editura Mondero, 1993; for P2 related to a personal measurement of a personal scientific text). The unification as words ordinal number of the independent variables of both P1 and P2, relative to the most frequent word: “eye” for P1, “system” for P2. Words Frequency ¼ function (words ordinal number) representation for P1 and P2 (after the “double whole” representation, then the polygonal contours corresponding to the observed concentration of the points are marked). Verifying of all local (contextual) heuristic algorithmic methods.

6.2 Solution (1) Identification of the discontinuities proper to words frequency ¼ function (words ordinal number). (2) Identification of groups of words (notions) with high frequencies. (3) Identification of the correlation between the first and the second sets resulting from the above two solution steps; resulting a specific string. (4) Comparison of the elements of the specific string identified above with other functions (related to the same (context) of the same words ordinal number; for P1 there were “other functions”. (5) Interpretation of the set of functions for each problems. The result is: graphic harmony (M. Mesarovic’s notion) for P1; graphic monotony for P2. 7. Conclusions (resulting from P1 and P2 comparative solutions) Proposition 11. There are graphical harmony/non-systemic entities, respective graphical monotony/systemic entities according with two types from the world of systemhood and individualhood: metasystem (m_SM)/mixed system. [There is the evidence from P1/P2 solutions.] It is necessary to underline that the above presented measurement techniques related to poetry and scientific texts show their metasystemic, respectively systemic lexical background. As there is a profound relation between thinking and language, it is possible to hope for a future identification of systemic and meta-systemic patterns of poetry and scientific thinking. It is a real base for a profound understanding of a part of the living support system. Figures 5 and 6 present the graphical solutions for P1 and P2. Figure 7 presents an iconic synthesis of this procedural hope. Proposition 12. If a lexical background and a related identification measurement technique would are a generative linguistic tool proper to be generalized inside a part (at least) of living support system, then systemic/non-systemic differentiation conceptualization might be a hopeful “unrevealed yet profound zone” (possible source for useful inventions and surprising discoveries).

Theory of systemic construction 1627

Figure 5. Identification of metasystemic context inside Mihai Eminescu’s antume poems

Figure 6. Identification of systemic context inside a scientific text

K 34,9/10

1628

Figure 7. Iconic synthesis

Notes 1. This research finds the methodological background to manage both consensus and conflict concepts; a proper systemic representation is assured to prove that a real system needs both sides, not incompatible or even contradictory as they are seen in sociology (Bailey, 1997). 2. The authors together with other 25 researchers are dedicated themselves to the way the natural and social science can work together to analyze and handle complexity shown up in societal problems (DeTombe and Dijkum, 1996). 3. This is a multidisciplinary volume upon the major problems of the civilization, science, Semantics, information and culture; the focusing idea is that of philosophical tension of humankind; academician Draganescu is the prominent leader of a long-term very active group of interdisciplinary studies; recent structural-phenomenological studies upon mind, consciousness, information and society are enriching his long-term achievements as forerunner; his prodigious and original corpus of works is wide range reflected by scientists and readers – the no-reaction is nearly absent (Draganescu, 1984). 4. This work is an original elaborated one; it deals with the design strategy, seen as the holistic answer for 40 question (from the meaning of the Universe to the truth), operating with proper ordinate concepts, and attaining General Systems Theory as comprehensive and anticipatory problem solving; non-entropy, synergy and ephemeralization are analyzed (Fuller, 1969). 5. These are the bridgeheads of long-term research related to the historical cradle and a peak of Sociocybernetics; professor Geyer is not only an interdisciplinary scientist but also a prominent leader of more then 200 researchers dedicated to contemporary social complexity (Geyer, 1977). 6. Advanced Synergetics. Instability Hierarchies of Self-Organizing Systems and Information and Self-organization. A Macroscopic Approach to Complex Systems Springer (1988); Synergetic Computers and Cognition, Springer (1991); Molecular Physics and Elements of Quantum Chemistry with H. C. Wolf, Springer (1995); Principles of Brain Functioning. A Synergetic Approach to Brain Activity, Behavior, and Cognition, Springer (1996); Brain Dynamics. Synchronization and Activity Patterns in Pulse-Coupled Neural Nets with Delays and Noise, Springer (2002) (Haken, 1977). 7. This is a dedicated work to “focused dialogue process” through a new methodological and informational product: FutureScapeTM used to facilitate strategic thinking inside various area: planning (strategic community, conference, project, curriculum), new product and career development, training (Sanders, 1998).

8. The author together with other 14 researchers are dedicated themselves to a deeper insight into ominous developments, and to better understand how changes and developments are generated, in order to regain the control over these. Taormina R.J. presents the human integration; Hiwaki K. presents the dual: development of the theory of interest/interest theory of development; Andonian G. presents the humanity in architecture; Hiller J. presents the problems of the telemedicine, Corning P.A. presents the group selection controversy in the evolutionary theory; Boyd G. McI. presents the liberative education, Murphy D. presents steps to a preservation of culture; their and their colleagues ways into the natural and social science aggregate a synergistic impact to the sociopolitical developments of the 3rd millennium (Lasker, 1998). 9. This presents the hypothesis: any type of human and social competence is based on our linguistic generative competence – this being one of the most significant follows from Naom Chomsky’s 1964-published work j inside some contexts this hypothesis was verified; professor Marcus is the founder/leader of a mathematical linguistics school; he is the author of about 300 research papers and 30 books, quoted by about 1000 authors; he was honored – at the 75 birth day – by a dynamic group of researchers, assistants, and enthusiastic readers (Marcus, 1974). 10. This is a comprehensive research presenting an aritmomorphic attempt upon systemic globalization; the emergent values dedicated to progress are: freedom, order, justice, health, wealth, knowledge, prestige, conservation of nature, qualities of activities (Parra-Luna, 1998). 11. This work provides extensive concepts upon knowledge, action, perception, memorization, and decision; the turning point is the “epistemo-praxiological loop” associated with mathematical described operators; there is an intimate objectivity-subjectivity link developed from a “radical constructivism” toward a “well temperate constructivism” (Valle´e, 1995).

References Bailey, D.K. (1997), “System and conflict: towards a symbiotic reconciliation”, Quality & Quantity, Vol. 31, pp. 425-42. DeTombe, D.J. and van Dijkum, C. (1996), Analyzing Complex Societal Problems/A Methodological Approach, Rainer Hampp Verlag, Munchen und Mering, p. 300. Draganescu, M. (1984), Science and Civilization, Editura Stiintifica si Enciclopedica, Bucharest, p. 288 (in Romanian). Fuller, B.R. (1969), Utopia or Oblivion: the Prospect for Humanity, Bantam Books, Toronto, New York, NY, London. Geyer, F. (1977), “General systems theory and the growth of the individual’s inner complexity as a function of time”, in Rose, J. and Bilciu, C. (Eds), Modern Trends in Cybernetics and Systems, Vol. 2, Springer, Berlin, pp. 59-78. Haken, H. (1977), Synergetics, and Introduction. Nonequilibrium Phase-Transitions and Self-Organization in Physics, Chemistry and Biology, Springer, Berlin. Lasker, G.E. (1998), “Synergistic effects of local and global developments on our lives and our future”, in Ramaekers, J. (Ed.), Proceedings of the 15th International Congress on Cybernetics, International Association of Cybernetics, Namur, pp. 587-664. Marcus, S. (1974), “Linguistics as a pilot science”, in Sebeok, Th.A. (Ed.), Current Trends in Linguistics, Vol. 12, Mouton, Hague.

Theory of systemic construction 1629

K 34,9/10

1630

Parra-Luna, F. (1998), “The notion of system as conceptual bridge between the sociology of organizations and organizational efficiency”, Proceedings of the Xth International Congress of World Organization of System and Cybernetics, Vol. 2: Sociocybernetics, Bren, Bucharest, pp. 248-56. Sanders, T.I. (1998), Strategic Thinking and the New Science/Planning in the Midst of Chaos, Complexity, and Change, The Free Press, New York, NY. Valle´e, R. (1995), Cognition et Syste`me/Essai d’Episte´mo-Praxe´ologye, L’Interdisciplinaire/ Syste`me(s), Limonest, p. 136. Further reading Amoroso Richard, L. (1999), “A brief introduction to noetic field theory. The quantization of mind”, in Rakic, K., Rakovic, D. and Koruga, D. (Eds), Brain and Consciousness, ECPD, Belgrade, pp. 297-302. Arrow, K.J. (1963), Social Choice and Individual Value, Wiley, New York, NY. Balaceanu, C. and Nicolau (Eds) (1972), Personalitatea umana O interpretare cibernetica, (The Human Personality. A Cybernetic Interpretation), Editura Junimea, Iasi. Belis¸, M. (1981), Bioingineria Sistemelor Adaptive Si Instruibile (The Bio-engineering of the Adaptive and Instructive Systems), Editura Stiintifica si Enciclopedica, Bucuresti. Bonting, S.L. (2001), “Need and usefulness of a revised creation theology: Chaos theology”, Science and religion Antagonism or Complementarity? paper presented at Science and Spiritual Quest – International Symposium, 8-11 November, Bucharest. Bunge, M. (1977), “Philosophical richness of technology”, in Suppe, F. and Asquith, P.D. (Eds), Philosophy and Social Action 2. Dubois, D. (1998), “Modelling of anticipatory systems with incursion and hyperincursion”, in Ramaekers, J. (Ed.), Proceedings of the 15th International Congress on Cybernetics, pp. 306-11. Dumitriu, A. (1944), Paradoxele stiintelor (Science’s Paradoxes). Imp. Nationala. Geyer, F. (1998), “The increasing convergence of social science and cybernetics”, Proceedings of the Xth International Congress of World Organization of System and Cybernetics, Vol. 2, Sociocybernetics, Bren, Bucharest, pp. 211-6. Go¨del, K. (1931), “Uber formal unentscheidhare sa¨tze der principia mathematica und verwandter systeme”, I. Monatshefte fu¨r Math. u. Physik. Bd., Vol. 38, pp. 173-98. Goguen, J.A. (1969), “The logic of inexact concepts”, Synthese, Vol. 19, pp. 325-73. Juilland, A., Edwards, P.M.H. and Juilland, I. (1965), Frequency Dictionary of Rumanian Words, Mouton & Co, The Hague. Malitza, M. (2000), “Ten thousand cultures, one single civilisation [Toward geomodernity of the XXI century]”, International Political Science Review, Vol. 21 No. 1, Zidul si iedera (The Wall and the Ivy), Cartea Romaneasca, 1978. Mora˜rescu, J. and Bulz, N. (2000), “Pentru abordarea extins-matematica a paradoxurilor si limitarilor (“Toward extended-mathematical approach of the paradoxes and limitations”), Academica, Vol. 11 Nos 1-2, pp. 121-2, pp.44. Negoita Constantin, V. and Ralescu Dan, A. (1975), Application of Fuzzy Sets to Systems Analysis, Birkha¨user Verlag. Nicolescu, B. (1996), La Transdisciplinarite´. Manifeste, Editions du Rocher, Monaco. Pa˜un, G. (1977), “Generative grammars for some economic activities”, Foundations of Control Engineering, 2 1 pp. 15-25. Pa˜un, G. (1995), Artificial Life: Grammatical Models, Black Sea University Press, Bucharest.

Searle, J. (2000), “The three gaps. From the classical theory of rationality toward consciousness approach”, paper presented at the Analytical Philosophy Insight Conference, The New Europe College, Bucharest, 19 May. Smith, M. (1995), “The prospects for machine consciousness”, in Ramaekers, J. (Ed.), Proceedings of the 15th International Congress on Cybernetics, pp. 306-11. Victor, S. (1996), De La Omul Necunoscut La Omul Cognoscibil (From the Unknown Human toward Cognitive-known Human.), Editura Ramida, Bucuresti. Zadeh, L.A. (1965), “Fuzzy sets”, Information and Control, IEE Transactions, Vol. 8, pp. 338-53. Appendix. Glossary Mixed system ¼ actional, informational, and managerial j decisional sub-systems; responsibility metasystem; profound zone. Systemic approach ¼ (hypothetical) real – model – ideal (norms); human psychic system; general system (theory) idea j its reconstruction. Equilibrium approach ¼ (meta)equilibrium of a mixed system; couple (m_Eq_MS, Eq_MS). Information ¼ information-reflection element, information model; objective information. Matter open triad ¼ substance, energy, information; matter j reflection; objective element(s), level(s); interaction(s), connection(s); rational subject, entity, system, profound zone. . . (mixed system). Spirituality ¼ rational subject , entity , system. Cognitive dichotomies ¼ reflexive/empirical; discursive/intuitive. Cognitive modes [and varieties of ¼ analytical, holistic, experimental, human systemic thinking (and artificial experiential intelligence reasoning)]. Systemic thinking (its varieties) ¼ V_ S;R jSpinoza; Russell, V_G jGoethe, V_L jLeibnitz, V_C jCusanus). World of systemhood and individualhood ¼ entities around a rational subject and systemological attempt. Properties of entities from the world of ¼ antientropy, synergy, systemhood and individualhood ephemeralization. Magellanity (M *)

¼ non-observable real – notion(s) inside (M).

Magellanity (M) ¼ (non-)observable real; mental concept/construct; (non-)theoretic concept; notion(s) – only local dual connections. Organizational efficiency (P-L) ¼ system’s (spirituality) attempt to world of systemhood and individualhood. Unamuno property (U), (U *), (U * *) ¼ types of approaches upon Reality, * virtual identity for ideas domain, * * general system idea j attribute of the matter/trans-reflexivity. Paradoxes and limitations ¼ spirituality (rational subject, entity, system); systemic – non-systemic context for Go¨del’s, Arrow’s, Pa˜un’s equivalent – non-equivalent theorem frames j their surpassing.

Theory of systemic construction 1631

K 34,9/10

1632

PINK ¼ (psychology, intelligence, neural, knowledge) generation of artifacts. {World-Mind} travel

¼ (M *), (U), (U *), (U * *), (P-L) inside (M).

i.e.

¼ (id est; that is to say).

e.g.

¼ (exempli gratia; for example).

q.v.

¼ (quod vide; which see).

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

To performance evaluation of distributed parallel algorithms

Distributed parallel algorithms

Juraj Hanuliak and Ivan Hanuliak Faculty of Control and Informatics, University of Zilina, Slovakia

1633

Abstract Purpose – To address the problems of high performance computing by using the networks of workstations (NOW) and to discuss the complex performance evaluation of centralised and distributed parallel algorithms. Design/methodology/approach – Defines the role of performance and performance evaluation methods using a theoretical approach. Presents concrete parallel algorithms and tabulates the results of their performance. Findings – Sees that a network of workstations based on powerful personal computers belongs in the future and as very cheap, flexible and perspective asynchronous parallel systems. Argues that this trend will produce dynamic growth in the parallel architectures based on the networks of workstations. Research limitations/implication – We would like to continue these experiments in order to derive more precise and general formulae for typical used parallel algorithms from linear algebra and other application oriented parallel algorithms. Practical implications – Describes how the use of NOW can provide a cheaper alternative to traditionally used massively parallel multiprocessors or supercomputers and shows the advantages of unifying the two disciplines that are involved. Originality/value – Produces a new approach and exploits the parallel processing capability of NOW. Gives the concrete practical examples of the method that has been developed using experimental measuring. Keywords Cybernetics, Programming and algorithm theory, Computer networks Paper type Research paper

1. Introduction There has been an increasing interest in the use of networks (cluster) of workstations connected together by high-speed networks for solving large computation-intensive problems. This trend is mainly driven by the cost effectiveness of such systems as compared to massive multiprocessor systems with tightly coupled processors and memories. Parallel computing on a cluster of workstations connected together by high-speed networks has given rise to a range of hardware and network related issues on any given platform. Load balancing, inter-processor communication, and transport protocol for such machines are being widely studied (Greenberg et al., 1996; Hanuliak, 1999; Hesham and Lewis, 1997; Hwang and Xu, 1998; Kumar et al., 2001; Sveda and Vrba, 2001). With the availability of cheap personal computers, workstations and networking devices, the recent trend is to connect a number of such workstations to solve computation-intensive tasks in parallel on such clusters. To exploit the parallel processing capability of a network of workstation (NOW), the application program must be paralleled. The effective way how to do it for a concrete application problem (decomposition strategy) belongs to a most important step in developing a effective parallel algorithm (Hanuliak, 2001a; Marinescu and Rice, 1995; Nancy, 1996).

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1633-1650 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614858

K 34,9/10

1634

NOW (Hanuliak, 1999; Hesham and Lewis, 1997; Hwang and Xu, 1998; Kumar et al., 2001; Williams, 2001) has become a widely accepted form of high-performance parallel computing. As in conventional multiprocessors, parallel programs running on such a platform are often written in an SPMD form (Single – program – multiple data) to exploit data parallelism or in a improved SPMD form to take into account also the potential of functional parallelism of a given application. Each workstation in a NOW is treated similarly to a processing element in a multiprocessor system. However, workstations are far more powerful and flexible than processing elements in conventional multiprocessors. We can also use the advantages of the new Intel’s SIMD (Single instruction Multiple data) or MMX (Multimedia extensions) instructions in the latest personal processors. 2. The role of performance Quantitative evaluation and modelling of hardware and software components of parallel systems are critical for the delivery of high performance. Performance studies apply to initial design phases as well as to procurement, tuning, and capacity planning analysis. As performance cannot be expressed by quantities independent of the system workload, the quantitative characterisation of resource demands of application and of their behaviour is an important part of any performance evaluation study. Among the goals of parallel systems performance analysis are to asses the performance of a system or a system component or an application, to investigate the match between requirements and system architecture characteristics, to identify the features that have a significant impact on the application execution time, to predict the performance of a particular application on a given parallel system, to evaluate different structures of parallel applications. To the performance evaluation we briefly review the techniques most commonly adopted for the evaluation of parallel systems and its metrics. 2.1 Performance evaluation methods To the performance evaluation we can use following methods. (1) Analytical methods: . application of queueing theory results (Hanuliak, 1999; 2002; and 2001a, b; Harrison and Patel, 1993; Hsu and Pen-Chung, 1997) . petri nets (Hanuliak, 1999; Hwang and Xu, 1998). (2) Simulation methods (Banks and Dai, 1997; Fodor et al., 1998). (3) Experimental measurement (Hanuliak, 1999; Hwang and Xu, 1998): . benchmarks; and . direct measuring of concrete developed parallel application. In order to extend the applicability of analytical techniques to the parallel processing domain, various enhancements have been introduced to model phenomena such as simultaneous resource possession, fork and join mechanism, blocking and synchronisation. Hybrid modelling techniques allow to model contention both at hardware and software levels by combining approximate solutions and analytical methods. However, the complexity of parallel systems and algorithms limit the

applicability of these techniques. Therefore, in spite of its computation and time requirements, simulation is extensively used as it imposes no constraints on modelling. Evaluating system performance via experimental measurements is a very useful alternative for parallel systems and algorithms. Measurements can be gathered on existing systems by means of benchmark applications that aim at stressing specific aspects of the parallel systems and algorithms. Even though benchmarks can be used in all types of performance studies, their main field of application is competitive procurement and performance assessment of existing systems and algorithms. Parallel benchmarks extend the traditional sequential ones by providing a wider set of suites that exercise each system component targeted workload. The Parkbench suite especially oriented to message passing architectures and the SPLASH suite for shared memory architectures are among the most commonly used benchmarks (Hwang and Xu, 1998). 2.2 Performance evaluation metrics For evaluating parallel algorithms there have been developed several fundamental concepts. Tradeoffs among these performance factors are often encountered in real-life applications. 2.2.1 Performance concepts. Let O(s, p) be the total number of unit operations performed by p-processor system for size s of the computational problem and T(s, p) be the execution time in unit time steps. In general, Tðs; pÞ , Oðs; pÞ if more than one operation is performed by p processors per unit time, where p $ 2: Assume Tðs; 1Þ ¼ Oðs; 1Þ in a single-processor system (sequential system). The speedup factor is defined as: Sðs; pÞ ¼

Tðs; 1Þ Tðs; pÞ

It is a measure of the speedup factor obtained by given algorithm when p processors are available for the given problem size s. Ideally, since Sðs; pÞ # p; we would like to design algorithms that achieve Sðs; pÞ < p: The system efficiency for an p-processor system is defined by: Eðs; pÞ ¼

Sðs; pÞ Tðs; 1Þ ¼ p pTðs; pÞ

A value of E(s, p) approximately equal to 1, for some p, indicates that such a parallel algorithm, using p processors, runs approximately p times faster than it does with one processor (sequential algorithm). 2.2.2 The isoefficiency concept. The workload w of an algorithm often grows in the order O(s), where s is the problem size. Thus, we denote the workload w ¼ wðsÞ as a function of s. In parallel computing is very useful to define an isoefficiency function relating workload to machine size p needed to obtain a fixed efficiency E when implementing a parallel algorithm on a parallel system. Let h be the total communication overhead involved in the algorithm implementation. This overhead is usually a function of both machine size and problem size, thus denoted h ¼ hðs; pÞ:

Distributed parallel algorithms 1635

K 34,9/10

The efficiency of a parallel algorithm implemented on a given parallel computer is thus defined as Eðs; pÞ ¼

1636

wðsÞ wðsÞ þ hðs; pÞ

The workload w(s) corresponds to useful computations while the overhead h(s, n) are useless times attributed to synchronisation and data communication delays. In general, the overhead increases with respect to both increasing values of s and p. Thus, the efficiency is always less than one. The question is hinged on relative growth rates between w(s) and h(s, p). With a fixed problem size (fixed workload), the efficiency decreases as p increase. The reason is that the overhead h(s, p) increases with p. With a fixed machine size, the overload h grows slower than the workload w. Thus, the efficiency increases with increasing problem size for a fixed-size machine. Therefore, one can expect to maintain a constant efficiency if the workload w is allowed to grow properly with increasing machine size. For a given algorithm, the workload w might need to grow polynomial or exponentially with respect to p in order to maintain a fixed efficiency. Different algorithms may require different workload growth rates to keep the efficiency from dropping, as p is increased. The isoefficiency functions of common parallel algorithms are polynomial functions of p; i.e. they are O( p k) for some k $ 1: The smaller a power of p in the isoefficiency function is, the more scalable the parallel system. Here, the system includes the algorithm and architecture combination. 2.2.3 Isoefficiency concept. We can rewrite equation for efficiency E(s, p) as Eðs; pÞ ¼ 1=ð1 ¼ hðs; pÞ=wðsÞÞ: In order to maintain a constant E, the workload w(s) should grow in proportion to the overhead h(s, p). This leads to the following relation: wðsÞ ¼

E hðs; pÞ 12E

The factor C ¼ E=1 2 E is a constant for a fixed efficiency E. Thus, we can define the isoefficiency function as follows: f E ð pÞ ¼ Chðs; pÞ: If the workload grows as fast as fE ( p) then a constant efficiency can be maintained for a given algorithm-architecture combination. 3. Complex performance evaluation To the complex performance evaluation of parallel algorithms we can use in the case of in the world used centralised parallel multiprocessor system (synchronous SIMD – Single instruction Multiple data parallel architectures and SMP – Symmetrical multiprocessors) and asynchronous (centralised or distributed MIMD – Multiple instructions multiple data parallel architectures) analytical approach to get under given constraints some analytical laws (Hesham and Lewis, 1997; Hwang and Xu, 1998; Kumar et al., 2001) (Amdahl’s law, Gustafson law) or some other derived analytical relations. Indeed, Amdahl’s Law and the extension described by Gustafson and others are only properly applied as limiting cases and have been successfully used to evaluate limitations and potential of parallel processing. The known analytical relations have been derived without considering architecture and communication

complexity. That means a performance P p ¼ f (calculation). Such assumptions could be real in some existed massively multiprocessor systems in the world but not in NOW based on personal computers. In NOW as a new form of asynchronous parallel systems (Andrews, 2000; Hanuliak, 1999; Hesham and Lewis, 1997; Hwang and Xu, 1998; Kumar et al., 2001; Williams, 2001), we have to take into account all aspects that are important for complex performance evaluation according to the relation P p ¼ f (architecture, communication, calculation). In such a case we can use the following solution methods to get a complex performance. . Direct measurement – real experimental measure of Pp and its components for a concrete developed parallel algorithm on the concrete parallel system. . Analytic modelling – to find Pp on the basis of some closed analytical expressions or statistical distributions for individual overheads. . Simulation technique – simulation modelling on concrete developed parallel algorithms on the concrete parallel system.

Distributed parallel algorithms 1637

4. The theoretical part 4.1 Method of direct measuring For suggested direct measuring of complex performance evaluation in a NOW we used the structure according to Figure 1. The way of measuring can be illustrated by flow diagrams in Figure 2 for complex performance evaluation of common sequential algorithms (for evaluation of some specified speed-up concepts), in Figure 3 for centralised parallel algorithms (SMP – symmetrical multiprocessors, etc.) and in Figure 4 for distributed parallel algorithms (in our case for NOW’s implementation). The difference between Figure 3

Figure 1. The measure on the NOW (ethernet network)

K 34,9/10

1638

Figure 2. Flow diagram to sequential performance measuring

and Figure 4 is principally in the way of implementing IPC (Inter-process communication) among decomposed parallel processes. 4.2 The concrete parallel algorithms 4.2.1 Numerical integration. A numerical integration is the typical example of the implicitly parallel algorithm, that the parallelism is the integral part of the own algorithm. Concretely for the calculation of the value p the following standard formula is used (Hanuliak, 1999; Hwang and Xu, 1998): Z 1 4 p¼ dx 1 þ x2 0 For the parallel way of calculation we used the potentially possibility of decomposition in the algorithms with numerical integration to the mutual independent parts (processes). In our concrete example we divide the whole calculation to its individual processes according to Figure 5. The individual independent processes we distribute for the calculation to the computer network in such a way that every process will be concurrently executed on the different node of a NOW network (the mapping of processes to individual workstations). After the parallel computation in the individual nodes of a network of workstations have been made we need only to sum the particular results to achieve the final value. To handle this task we have to choose one of the nodes of a network of workstations. As well at the beginning the chosen node (for example, node 0) must know the value n (the number of the strips in every process) and this node has to let it know to the other active nodes. The example of the parallel algorithm of p computation is then following:

Distributed parallel algorithms 1639

Figure 3. Flow diagram for centralised architectures (SMP)

if my node is 0 read the number n of strips desired and send it to all other nodes else receive n from node 0 end if for each strip assigned to this node calculate the height of rectangle (at midpoint) and sum result end for if my node is not 0 send sum of result to node 0

K 34,9/10

1640

Figure 4. The flow diagram for measuring in NOW’s

else receive results from all nodes and sum multiple the sum by the width of the strips to get p return 4.2.2 The discrete Fourier transform. The discrete Fourier transform (DFT) has played an important role in the evolution of digital signal processing techniques. It has opened new signal processing techniques in the frequency domain, which are not easily realisable in the analogue domain. The DFT is defined (Basoglu et al., 1997; Hanuliak, 1999; Hwang and Xu, 1998) as:

Xn ¼

N 21 X m¼0

xm w mn ;

n ¼ 0; 1; . . . ; N 2 1

Distributed parallel algorithms 1641

Figure 5. The decomposition of the numerical integration problems

and the inverse discrete Fourier transform (IDFT) as: 21 1 NX X n w 2mn ; m ¼ 0; 1; . . . ; N 2 1 xm ¼ N R¼0 where w is N - root of unity, i.e. w ¼ e2ið2p=N Þ for generally complex numbers. In principle the mentioned equations are the linear transforms. Direct computations of the DFT or the IDFT, according to definitions require N 2 complex arithmetic operations. In such a way we could take into account only the calculation times and not also the overheads times caused through a parallel way of an algorithm implementation. Cooley and Tukey (Basoglu et al., 1997; Hanuliak, 1999) developed a fast DFT algorithm which requires only OðN log2 ðN ÞÞ operations. The difference in execution time between a direct computation of the DFT and the new DFFT algorithm is very large for large N. Direct computations of the DFT or the IDFT, according to the following program, requires N 2 complex arithmetic operations. Program Direct_DFT; var x, Y: array[0..Nminus1] of complex; begin for k :¼ 0 to N 2 1 do begin Y½k :¼ x½0; for n :¼ 1 to N 2 1 do Y½k :¼ Y½k þ Wnk * x[n]; end; end.

K 34,9/10

1642

For example, the time required for just the complex multiplication in a 1024-point FFT is T mult ¼ 0; 5N log2 ðN Þ4T real ¼ 0; 5:1024 log2 ð1024Þ4T real ; where the complex multiplication corresponds approximately to four real multiplication. The principle of Cooley and Tukey algorithm, which use a divide-and-conquer strategy, shows Figure 6. Several variations of the Cooley-Tukey algorithm have since been derived. These algorithms are collectively referred to as the discrete fast Fourier transform (DFFT) algorithms. The basic form of parallel DFFT is the one-dimensional (1D), unordered, radix-2 (a use of divide and conquer strategy according the principle in Figure 6). The effective parallel computing of DFFT tends to computing 1D FFT’s with radix equals and greater than two and computing multidimensional FFT’s by using the polynomial transfer methods. In practical part of this paper we computed 2DFFT (two-dimensional DFFT). In general, a radix-q DFFT is computed by splitting the input sequence of size s into a q sequences of size n=q each, computing faster the q smaller DFFT’s, and then combining the result. For example, in a radix-4 FFT’s, each step computes four outputs from four inputs, and the total number of iterations is log4 s rather than log2 s. The input length should, of course, be a power of four. Parallel formulations of higher - radix strategies (e.g. radix-3 and 5) 1D or multidimensional DFFT’s are similar to the basic form because the underlying ideas behind all sequential DFFT are the same. An ordered DFFT is obtained by performing bit reversal (permutation) on the output sequence of an unordered DFFT. Bit reversal does not affect the overall complexity of a parallel implementation. 5. The results To measure both a calculation and overhead times the function “Query Performance Counter”, which measures calculation times in ms was used. For these purposes, we used common personal computers according to Table I (the principles in using the more powerful personal computers are the same but we would get better results). The developed parallel algorithms were divided in to two logical parts – manager and worker programs. All programs are written on the WNT (Windows New Technology)

Figure 6. An illustration of divide-and-conquer strategy for DFFT

platform. Manager control the computer with starting services, makes the connections and starts in parallel way the remote functions. At the end sums the particular results. Every server waits for calculation starting and then calculates the particular results. At the end of calculation returns to the manager the calculated results and the calculation time. The results of the whole calculation are not only the calculated results but also the calculation and communication times with individual server. To measure the calculation time it used the function “Query Performance Counter”, which measures calculation times in ms. Calibration power of the used workstations based on personal computers at our experiments for p calculation are shown in Figure 7 and for 2DFFT in Figure 8. From Figure 7 (algorithm of numerical integration) we can see that with decreasing epsilon (proportionally increasing number of intervals) the execution time increases in a linear way. It is caused through linear increasing of the calculation sum (input load according to Table II). But communication overheads remain constant. The achieved results for 2DFFT algorithm document is geometrically increasing of both computation and communication parts with the quotient value nearly four for analysed matrix dimensions (increasing matrix dimension means to do twice more computation on columns and twice more on rows). Therefore, for better illustration we have used dependencies on relative input load defined according to Table III. The results in ethernet NOW (numerical integration of p calculation) are graphically illustrated in Figure 9 and for 2DFFT in Figure 10. In both cases we limited for better graphical illustration the measured values for WS1 network node. The influence of matrix dimension to the network load is shown in Figure 11. The percentile amount of the individual parts (computation, overheads – network load, initialisation) are illustrated for p execution time with epsilon ¼ 1025 in Figure 12 Label

Processor

RAM [MB]

WS1 WS2 WS3 WS4 WS5 WS6 WS7

Pentium Pentium Pentium Pentium Pentium Pentium Pentium

64 128 256 512 512 1000 1000

I 233 MHz II 450 MHz III 933 MHz IV 1 Ghz IV 1.4 Ghz IV 2.26 Ghz IV Xeon – 2 proc., 2.2 GHz

Distributed parallel algorithms 1643

Operation syste´m Windows Windows Windows Windows Windows Windows Windows

2000 2000 2000 2000 2000 2000 2000 Server

Table I. Parameters of the used personal computers

Figure 7. Calibration power results for p calculation

K 34,9/10

1644 Figure 8. Calibration power results for 2DFFT

Table II. Measured results for p calculation

Number of intervals Epsilon WS1 WS2 WS3 WS4 WS5 WS6

Table III. Measured results for 2DFFT computation

Relative input load Matrix dimension WS1 WS2 WS3 WS4 WS5 WS6

105 102 5 27 14 7 6 4 2

1 32 £ 32 7 4 3 3 2 1

4 64 £ 64 31 15 13 12 8 5

106 102 6 279 134 70 58 42 25

16 128 £ 128 138 65 54 49 36 22

107 102 7 2 772 1 355 668 598 425 250

64 256 £ 256 606 293 246 220 159 94

108 102 8 27 487 13 400 6 584 6 116 4 279 2 510

256 512 £ 512 2,819 1,316 1,066 992 671 402

109 102 9 274,193 133,976 65,459 59,844 42,542 25,022

1024 1024 £ 1024 12,390 5,657 4,498 4,125 2,926 1,728

and for epsilon ¼ 1029 in Figure 13. We can see that increasing epsilon (higher input load) results in dominating influence of computation time (network loads remain constant). The percentile amount of the individual parts (computation, overheads – network load, initialisation) at 2DFFT execution time for the matrix 512 £ 512 is shown in Figure 14 and for matrix 1024 £ 1024 in Figure 15. In both cases the percentile amount of individual parts are nearly the same. The high network loads are involved through the needed matrix transpositions during 2DFFT computation. The comparison of the sequential and parallel way of execution on SMP parallel system (WS7 according Table I) are shown in Figure 16 for p execution and for 2DFFT algorithm in Figure 17.

Distributed parallel algorithms 1645

Figure 9. The results in NOW for p calculation (ethernet network)

Figure 10. The results in NOW for 2DFFT calculation (ethernet network)

K 34,9/10

1646 Figure 11. The influence of matrix dimension to network load

Figure 12. The individual parts for p execution time ðepsilon ¼ 1025 Þ

Figure 13. The individual parts for p execution time ðepsilon ¼ 1029 Þ

Distributed parallel algorithms 1647

Figure 14. The individual parts of the 2DFFT execution time ð512 £ 512Þ

Figure 15. The individual parts of the 2DFFT execution time ð1024 £ 1024Þ

Figure 16. Comparison of the sequential and parallel p calculation

K 34,9/10

1648 Figure 17. Comparison of the sequential and parallel 2DFFT calculation

6. Conclusions Distributed computing was reborn as a kind of “lazy parallelism”. A network of computers could team up to solve many problems at once, rather than one problem higher speed. To get the most out of a distributed parallel system, designers and software developers must understand the interaction between the hardware and the software parts of the system. It is obvious that the use of a computer network based on personal computers would be principally less effective than the used typical massively parallel architectures in the world, because of higher communication overheads, but a network of workstations based on powerful personal computers, belongs in the future to very cheap, flexible and perspective asynchronous parallel systems. We can see such a trend in dynamic growth just in the parallel architectures based on the networks of workstations as a cheaper and flexible architecture in comparison to conventional multiprocessors and supercomputers. But the principles of these in the world realised multiprocessors are implemented to this time in modern symmetric multiprocessor systems (SMP) based on the same processors which are realised on a workstation’s motherboard. Unifying of both approaches (NOW’s implementing both simple and SMP workstations) open the new possibilities in HPC computing. The next steps in the evolution of distributed parallel computing will take place on both fronts: inside and outside the box. Inside, parallelism will continue to be used by hardware designers to increase performance. Intel’s new SIMD or MMX instructions technology, which implements a small/scale form of data parallelism, is one example. Out-of-order instruction execution by super-scalar processors in all latest powerful processors is another example of internal parallelism. Therefore, in relation to our achieved results we are able to do better load balancing among used network nodes (performance optimisation of parallel algorithm). For these purposes we can use calibration results of individual network nodes in order to divide the input load according to the measured performance power of used network nodes. Second we can do load balancing among network nodes based on modern SMP parallel systems and on network nodes with only single processors. Generally we can say that the parallel algorithms or their parts (processes) with more communication (similar to analysed 2DFFT algorithm) will have better speed-up values using modern SMP parallel system as its parallel

communication overheads (similar to analysed p computation) we can use the other network nodes based on single processors. Queueing networks and Petri nets models, simulation, experimental measurements, and hybrid modelling have been successfully used for the evaluation of system components. Via the form of experimental measurement we illustrated the use of this technique for the complex performance evaluation of parallel algorithms. In this context we presented the first part of achieved results. We would like to continue in these experiments in order to derive more precise and general formulae (generalisation of the used Amdahl’s and Gustafson’s laws at least in a limited way for typical used decomposition strategies or similar application problems) and to develop suitable synthetic parallel tests (SMP, NOW) to predict performance in NOW for some typical parallel algorithms from linear algebra and other application oriented parallel algorithms. In future we will also report about these results.

References Andrews, G.R. (2000), Foundations of Multithreaded, Parallel, and Distributed Programming, Addison-Wesley Longman, Glen View, IL, p. 664. Banks, J. and Dai, J.G. (1997), “Simulation studies of multiclass queueing networks”, IEEE Transactions, Vol. 29, pp. 213-9. Basoglu, Ch., Lee, W. and Kim, Y. (1997), “An efficient FFT algorithm for super-scalar and VLIW processor architectures”, Real Time Imaging, Vol. 3 No. 6, pp. 441-53. Fodor, G., Blaabjerg, S. and Andersen, A. (1998), “Modelling and simulation of mixed queueing and loss systems”, Wireless Personal Communication, Vol. 8, pp. 253-76. Greenberg, D.S., Park, J.K. and Schvabe, E.J. (1996), “The cost of complex communication on simple networks”, Journal of Parallel and Distributed Computing, Vol. 35, pp. 133-41. Hanuliak, I. (1999), Parallel Computers and Algorithms, ELFA Press, Kosˇice, p. 327 (in Slovak). Hanuliak, I. (2001a), “Buffer management control in data transport network node”, The International Journal of Systems Architecture, Elsevier Science, Amsterdam, Vol. 47, pp. 529-41. Hanuliak, M. (2001b), “To the behaviour analysis of mobile data networks”, Proceedings of 7th ˆ RGU – JIU, Romania, 9-10 November, pp. 170-5. Scientific Conference, TA Hanuliak, I. (2002), “On the analysis and modelling of computer communication systems, Kybernetes”, The International Journal of Systems & Cybernetics, Vol. 31 No. 5, pp. 715-30. Harrison, P.G. and Patel, N. (1993), Performance Modelling of Communication Networks and Computer Architectures, Addison-Wesley, Reading, MA, p. 480. Hsu, W.T. and Pen-Chung, Y. (1997), “Performance evaluation of wire-limited hierarchical networks”, Parallel and Distributed Computing, Vol. 41 No. 2, pp. 156-72. Hesham, EL-Rewini and Lewis Ted, G. (1997), Distributed and Parallel Computing, Manning Publications, Greenwich, CT, p. 467. Hwang, K. and Xu, Z. (1998), Scalable Parallel Computing: Technology, Architecture, Programming, McGraw-Hill, New York, NY, p. 802. Kumar, V., Grama, A., Gupta, A. and Karypis, G. (2001), Introduction to Parallel Computing, 2nd ed., Addison-Wesley, Reading, MA, p. 856.

Distributed parallel algorithms 1649

K 34,9/10

1650

Marinescu, D.C. and Rice, J.R. (1995), “On the scalability of asynchronous parallel computations”, Parallel and Distributed Computing, Vol. 31, pp. 88-97. Nancy, A.L. (1996), Distributed Algorithms, Morgan Kaufmann Publishers, San Francisco, CA, p. 872. Sveda, M. and Vrba (2001), “Sensor networking”, Proceedings of Eight IEEE Int. Conf. on the Engineering of Computer - based Systems, IEEE Computer Society Press, Washington, DC, pp. 262-8. Williams, R. (2001), Computer Systems Architecture – A Networking Approach, Addison-Wesley, Wokingham, p. 660.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Contemporary systems and cybernetics New initiatives in the development of neuron chips and in biomimetics

Development of neuron chips and biomimetics 1651

Brian H. Rudall Norbert Wiener Institute of Systems and Cybernetics and University of Wales, UK Abstract Purpose – Reviews current initiatives in the development of neuron/silicon chips and in biomimetics. Design/methodology/approach – A general review and survey of selected research and development topics. Findings – Illustrates the multi- and trans-disciplinary interests of cybernetics and systems and aims to further research and development activity. Practical implications – The choice of reviews provides an awareness of the current initiatives and trends in these areas of research and endeavour. Originality/value – The reviews are selected from a global database and give a studied assessment of current research and development initiatives. Keywords Automation, Biology, Computers, Cybernetics, Research Paper type Technical paper

Human-machine interface – neurons on chips For decades the use of computer systems has relied on users communicating information via keyboards. More sophisticated input peripherals have, of course, been developed such as voice, touch screens and others, but none has replaced the traditional typing-in and keying-in as the main means of access to what is a much more advanced computer machine. Hopes were raised therefore, last year when the leading computer company Microsoft applied for and secured a US patent covering the use of the human body as a conductor in connection with electronic appliances. The company, however, are more reticent about their research programme and have said that they have no specific product in mind. Cybernetics researchers in the human-interface have for some time advocated a computer-user interface where access is gained by linking the human body directly to the machine. Indeed, in this section we have on a number of occasions included reports of such systems. In the main these have linked the human brain directly to the machine and they have already demonstrated a degree of success when for example, a computer user was able to command the machine to perform certain tasks. In most cases this has been achieved by monitoring the actions of certain parts of the brain and translating the signals via digital pulses to the computer system. In a recent article (Gross, 2004) this challenge to link user to machine was discussed and a new line of research was described. Dr Peter Fromherz, a director of the Max Planck Institute for Biochemistry, at Martinsried, Nr. Munich, Germany, has been engaged in research for some time studying possible connections between silicon

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1651-1655 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614867

K 34,9/10

1652

electronics and biological cells. What struck Dr Fromherz was indeed obvious. Both computers and our brains communicate with electrical signals so why therefore, should it not be possible to create a direct interface between them. No need, he says, for eyes, monitors, ears and speakers, hands or keyboards – if the computer is so clever, why can’t it just read my mind? To address this challenge we are told that he: Set out to grow neurons from the medical leech (Hirudo medicinalis) on silicon chips and persuade the two parties to talk to each other. Transmission of a signal from the neuron to the chip first succeeded in 1991, the reverse process four years later. Essentially, the recording of the neuron signal by the chip relies on a field effect transistor, while the electronic stimulation of the neuron arises from a voltage pulse applied to a capacitor, so both processes are absolutely non-invasive and don’t affect the survival of the cell in any way.

Leaving the University of Ulm where he then worked he moved to the MaxPlank Institute and set about his pioneering research with goals to establish the precise nature of the chip/neuron interface. He expanded his work to calls from other sources and built more complex systems consisting of neurons and semiconductors. Dr Fromherz and his researchers established that: . . .an ordinary silicon chip, with the outermost 15 nm oxidised, is an ideal substrate to cultivate neurons on. The silicon oxide layer insulates the two sides and stops any electrochemical charge transfer, which might damage the chip or the cell. Instead, there is only a capacitative connection, established by a so-called planar core-coat conductor. Proteins sticking out of the lipid membrane ensure that there is a thin (50-100 nm) conducting layer between lipid and silicon oxide, which constitutes the core of the conductor.

Indeed, the whole set-up could be represented as a simplified electrical circuit. This is described as one where: . . .both the membrane and the silicon oxide have a defined capacitance, and the electrolyte layer between them (which is part of the medium surrounding the whole cell) has a given ohm resistance. The dynamic properties of the system are dominated by the ion channels within the cell membrane, which determine the ohmic conductance of the membrane and thus the propagation of the electrical action potential, which are the typical neuronal signals.

The neuron-to-chip experiment also produced the following explanation of the signal transfer: In the neuron-to-chip experiment, the current generated by the neuron has to flow through the thin electrolyte layer between cell and chip. This layer’s resistance creates a voltage, which a transistor inside the chip can pick up as a gate voltage that will modify the transistor current. In the reverse signal transfer, a capacitative current pulse is transmitted from the semiconductor through to the cell membrane, where it decays quickly, but activates voltage-gated ion channels that create an action potential.

Further details of this investigation are available from the researchers and Gross (2004 pp 31). In summary, the interface aims at: . passage of electrical signals between both brain and computer by getting nerve cells and silicon chips to interact directly;

transmissions of electrical signals between chips and neurons can already be achieved on a small scale without invasive connections or damage to either transmitter; and potential use of combining this technology for many other applications, e.g. computer designs, vision, hearing, control for the disabled and other, etc.

Development of neuron chips and biomimetics

Further challenges were tackled which could have a dramatic effect on future applications. Details of the construction of an imaging network and hopes for commercially valuable spin-off products. There are also indications that the developments by Dr Fromherz have now received a great deal of attention both in the media and in research centres worldwide. His bio-electronic hybrid systems will undoubtedly form part of future systems and applications.

1653

.

.

Biomimetics New field for cybernetics and systems Researchers who mimic nature’s creatures in their designs for robotic systems now refer to their field as Biomimetics Northeastern University researches. Dr Ayers of the Marine Science Center, Nahant, Massachusetts, USA, is developing a robotic lobster that he hopes will have a number of potentially useful applications. Funded by the Office of Naval Research USA, it is already in an advanced state of development. Dr Ayers is a professor at the Northeastern University, Boston, US. And he has been experimenting with his lobster for some time. With legs extending from the lobster’s abdomen the creature can move its industrial-strength plastic body and the nickel metal hydride battery along a small-rock strewn sandy sea bed. It weighs some seven pounds so to get it to move in water is an achievement itself, but he is intent on making it clamber over rocks. He hopes that by the time he is ready to demonstrate it to the military the lobster will have two claws, which it can use as bump sensors. “When it walks into a rock”, he explains, “it will be able to decide whether to go over it or around it, depending on the rock’s size”. This is one example of the challenges faced by robotic researchers who regard animals as creatures that can be mimicked in the form of robots. In addition to the lobster, flies, dogs, fish, snakes, geckos and cockroaches and many more species are being used as inspiration for the new generation of robots (Kirsner, 2004). Dr Ayers is reported to believe that: Animals have adapted to any niche where we’d ever want to operate a robot,. . . RoboLobster, for instance, is being designed to hunt for mines that float in shallow waters or are buried beneath beaches, a harsh environment where live lobsters have no trouble maintaining sure footing.

There is a great incentive to develop such machines and such machines that are based on such creatures will be able to operate in places where today’s generation of robots are unable to go. Information about this project can be obtained from Lobster (2005). Carnegie Mellon University research At Carnegie Mellon University, US, research is being conducted in this field. Dr Howie Choset, for example, has been testing sinuous segmented robots based on snakes

K 34,9/10

1654

and elephant trunks. These he believes may be the perfect machines to search for survivors inside rubble left by structures destroyed by disasters, such as fire, earthquake or other natural causes. Details can be obtained from Snake (2005). University of California research projects Dr Shankar Sastry at the University of California is one of a research team involved with biomimetics and is currently helping to design robotic flies, fishes and the wall-climbed gecko. He says that: “What has been a surprise to me is how hard it has been to make progress” (Fly, 1999; Gecko, 2002). Massachusetts Institute of Technology At Massachusetts Institute of Technology (MIT) robotic fish such as RoboPike and RoboTuna are being developed by the Institute’s researchers. More information can be obtained from Fish (2005). Tacom Group Tacom is the abbreviated name of the Army’s Tank-Automotive and Armaments Command, Warren Michigan US. The group has received over $1 million to conduct research into walking robots. A report of the project says that: One is a robotic “mule” that would serve as a diesel generator, providing power to mobile units. It is being designed by a pair of former Disney employees who were responsible for building a nine-foot dinosaur robot named Lucky that sometimes roams the Disney theme parks. Another Tacom mule would carry equipment for soldiers, enabling them to march longer distances. A robotic dog, being developed for Tacom by a Cincinnati company called Yobotics, might someday serve as a soldier’s best friend. “Imagine you have a sniper hiding behind a wall,. . . You want to send something out, something sacrificial to draw fire, or to look around corners where you don’t want to look.”

Future developments In summary, the researchers from these organizations have differing views about the prospects of developing robots that mimic the creatures of nature. In this section we have frequently described such projects. There are so many potential applications, for example, at one university research laboratory – Carnegie Mellon, Dr Choset presents a very optimistic picture he reports that: Advances in legged robots could eventually lead to more realistic and utilitarian prosthetic limbs for amputees. Reptilian robots could one day be used to inspect underground fuel tanks or, on a smaller scale, to perform medical tests and surgery inside the human body. One goal of ours is to be able to do surgical procedures in a minimally invasive fashion. This summer (2004), a preliminary test, inserting a snake robot with a diameter of less than an inch into the abdomen of a live pig was performed.

Most of the researchers engaged in these projects believe that an important advance will be the “artificial Muscle”. This is a synthetic substance that can be made to contract and relax when electricity is applied, in the same way that an organic muscle does. Dr Robert Full from the University of California says:

Muscles are spectacular springs, shock absorbers, struts, brakes and motors, all rolled up into one thin tissue.

An example of the development of artificial muscles is the RoboLobster, built by Dr Ayers of the Northeastern University, which has we are told: . . .delicate, wiry artificial muscles that move its legs, made of a nickel-titanium alloy called nitinol that contracts when electricity is applied. And earlier this year, a start-up company called Artificial Muscle was spun out of SRI International, a Silicon Valley research group, to commercialize a new kind of musclelike polymer.

Dr Mark Raibert of the Boston Dynamics Company, Cambridge Massachusetts US, gives another example of possible future trends. He watches videos of mountain goats moving over rough terrain and notes the movement and action of the goat’s foot, in particular, how it gets traction on a very steep surface. His company is working on biomimetic project and is relying on Harvard University biologists, who dissect goats, to give a better understanding of how a goat is likely to move in a similar manner. There appears to be a great deal of cooperation between research groups. Boston Dynamics is developing a climbing robot that uses a geckolike substance developed by Professor Full of the University of California for adhesion. This company is also building a six-legged robot that has spring-like legs similar to a roach’s (Cockroach, 2005). The company is now working on a project to produce a running quadruped called BigDog. All of these endeavours, at different states of development, do provide us with some confidence that biomimetics will produce a new generation of robots which will replicate the designs of nature’s creatures. References Cockroach (2005), A six-legged robot inspired by the cockroach, available at: www.rhex.net RHex. FISH (2005), MIT’s robotic fish, RoboPike and RoboTuna, available at: web.mit.edu/towtank/ www/media.html#pike FLY (1999), Robotic fly project at the University of California Berkeley, available at: www. Berkeley.edu/news/media/releases/99legacy/6-15-1999pix.html GECKO (2002), Mecho-Gecko, developed by the iRobot Corporation and the University of California Berkeley, available at: www.Berkeley.edu/news/media/releases/2002/09/rfull/ robots.html Gross, M. (2004), “Plugging brains into computers”, Chemistry World, September, pp. 30-3, available at: www.proseandpassion.com Kirsner, S. (2004), “They are robots those beasts!”, Circuits, The New York Times, 16 September. LOBSTER (2005), Northeastern University’s robot lobster (click on any link under Online Animations of Biomimetic Systems), available at: www.neurotechnology.neu.edu SNAKE (2005), Carnegie Mellon University’s snake robot, available at: www.snakerobot.com Further reading Eversmann et al. (2003), IEEE J. Solid State Circuits, Vol. 38, p. 2306. Hutzler, P. and Fromherz, P. (2004), Eur. J. Neurosci., Vol. 19, p. 2231. Jenkner, M., et al. (2001), Biol. Cybern., Vol. 84, p. 239. Kaul, R.A., et al. (2004), Phys. Rev. Lett., Vol. 92.

Development of neuron chips and biomimetics 1655

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 34,9/10

1656

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Internet commentary Cybernetics and systems on the web: hoax paper, nanotechnology A.M. Andrew Reading University, Earley, Reading, UK Abstract Purpose – The aim is to review developments on the internet, especially those of general cybernetic interest. Design/methodology/approach – A recent episode involving a hoax paper is reviewed, along with discussions of the implications for paper refereeing and information dissemination generally. Some sources of information on nanotechnology, with applications in medicine, are reviewed. Findings – That a hoax paper was accepted casts doubt on the review process, but the situation is not clear-cut and the usefulness of mammoth conferences is also questioned. Nanotechnology is shown to be poised for major advances. Practical implications – The generation of a convincing hoax paper is an interesting technical achievement in itself. Implications for the review process are explored. Sources of information on nanotechnology are indicated. Originality/value – It is hoped this is a valuable periodic review. Keywords Nanotechnology, Research work, Peer review, Internet Paper type General review

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1656-1658 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920510614876

Hoax paper accepted In a message to the CybCom discussion group on 15 April 2005, Klaus Krippendorff drew attention to a report by the CNN news service of a hoax paper produced by three graduate students at MIT. The paper was generated by a computer program and was accepted for presentation at a major conference. At the time of the CybCom mention, the CNN report was available on their web site http://edition.cnn.com, but was to be withdrawn after 14 days. However, an account of the motivation, and details of the generation technique written by the graduate students themselves, is available at: http://pdos.csail.mit.edu/scigen, with the title: “SCIgen – An Automatic CS Paper Generator” and the successful hoax paper is reproduced at: http://pdos.csail.mit.edu/ scigen/rooter.pdf. The title of the paper is: “Rooter: A Methodology for the Typical Unification of Access Points and Redundancy” and it was one of two submitted for the World Multi-Conference on Systematics, Cybernetics and Informatics (WMSCI) to be held on 10-13 July in Orlando, Florida. The aim of the fake submission was, according to the perpetrators, to counter “fake conferences. . .which exist only to make money”. In fairness to the reviewers who were given the paper for evaluation it has to be mentioned that none of them recommended acceptance, but neither did they recommend rejection. The paper was accepted on a “non-reviewed” basis, meaning that the feedback from reviewers had not arrived by the acceptance deadline and the authors had complete responsibility for the content. The paper has the superficial

appearance of being the result of much careful work, with nicely prepared figures and reference list, etc. and I have to admit that my own reaction would be to be reluctant to condemn it until I found clear evidence of its falsity. One consequence of the episode is likely to be that reviewers will start to be more critical of submitted material and may abandon the usual tacit assumption of good faith on the part of authors. The method of generation presumably depends on “putting parsing into reverse” as discussed for instance in Andrew (1983). The reactions in the CybCom discussion list (with archives at: http://hermes.circ. gwu.edu/archives/cybcom.html) were varied, but mostly sympathetic to the referees. The task of refereeing is agreed to be difficult and time consuming and liable to be laid aside till a last possible moment when the referee is busy and probably engaged on innovative projects of his or her own. If it is done conscientiously, it can lead to a series of exchanges with the original author that can amount to a tutorial session, and Ranulph Glanville acknowledges in his contribution on 18/4 that he has found himself engaged in such exchanges. Refereeing can be seen as a form of censorship and in this sense is undesirable and can promote orthodox viewpoints at the expense of those that strike out in new directions. This point is taken up by Tony Booth and also Kevin Kreitman in contributions on 17/4. Kevin Kreitman, however, also makes a case for imposing some form of selection by acknowledging that she has “voted with her feet” by choosing not to attend certain conferences whose content she has found to be uninspiring. In his contribution on 14/4, Stuart Umpleby has some comments on the conference to which the hoax papers were submitted. It seems the conference is on a vast scale, and is notorious for the number of e-mail invitations distributed, and in 1 year had a set of proceedings amounting to 19 volumes. There seems to be some basis for the hoaxers’ claim that the conference is designed to make money rather than to advance the topic area. There is general agreement that the problems are particularly severe in connection with cybernetics where the range of topics that can be encountered is particularly wide, and Tony Booth suggests multiple reviewers with appropriate specialisations. The fundamental question of the emergence of orthodoxy is epitomised by asking: “Who criticises the critics?” There seems to be no clear answer nor ideal system and the only hope is that the “top people” in the subject are sufficiently imaginative that offbeat approaches get a hearing. My own feeling is that, in the particular context of cybernetics they probably are, though some worrying trends towards orthodoxies are certainly discernible. Another point treated in the discussion is the use of computer methods in authorship. Clearly the hoaxers have carried it to an unacceptable extreme, but a number of discussants refer to software methods of at least generating summaries automatically. There is in fact a useful comparison of softwares allowing this. Of course, a consideration that excuses the use of such aids is that the result is presumably checked by the human author before submission. It would be interesting to know to what extent the hoax papers were a result of human inspection and selection from a greater number of machine-generated candidates. There is perhaps a sense in which the hoax papers are more “human” than their perpetrators believed. An aspect that rather surprisingly has not been mentioned in the discussion is the part played by refereed papers in reflecting kudos and determining tenure

Internet commentary

1657

K 34,9/10

and promotion in academic employment. One function of referees is to be, in effect, consultants to promotion boards. The hoax papers have stimulated an interesting debate about the many considerations connected with the refereeing process. As it was expressed by one person concerned with awarding research grants, it is “difficult to distinguish a crank from someone who is ahead of his time”.

1658 Nanotechnology One of the most interesting and promising areas of robotics is nanotechnology, especially in the medical context. Miniature devices have already proved valuable as sensors and manipulators in keyhole surgery and as the active ends of endoscopes. Another application is the “radio pills” pioneered by Professor Heinz Woolf, now of Brunel University. These can be swallowed and eventually recovered in the excreta, and as they pass through the alimentary canal, the exact location can be determined by X-ray while the pill signals values of, for example, temperature and pH. Pills have been devised that respond to a radio command to take a sample of their current surrounding fluid and store it for eventual analysis. The possibility of still smaller robotic devices, to travel within blood vessels, is reviewed in the NHS (National Health Service) Magazine for March 2005 (the last issue of the magazine that will be produced, apparently because of changing needs of the healthcare service). The article is entitled “Small World”, by Clare Walker, and it reports on initiatives within the NHS to develop and use the new methods, including the creation of a new device evaluation service and innovation centre. It is also mentioned that the US plans to spend 800 million dollars next year on nanotechnology, a greater amount than was spent on the human genome project. The article can be found at: www.nhs.uk/nhsmagazine. Currently, the main British centre for research in this area is in Cambridge, and details can be found at: www.nanoscience.cam.ac.uk. These include a list of staff members with notes on the research specialities of each, many of them concerned with the behaviour of proteins and other materials at the molecular level. Another site mentioned in the article is www.bhta.com, for the British Healthcare Trades Association, representing assistive technologies in healthcare. It is mentioned in a news release dated 1 October 2004 that BHTA is to head an EU Nanotechnology project, funded by the EC with two million Euros and involving 14 organisations throughout Europe. No technical details of projects are given, but it seems safe to assume that major developments are in the offing. Reference Andrew, A.M. (1983), Artificial Intelligence, “Aesthetics”, Chapter 11, Abacus, Tunbridge Wells.

Book reviews Mathematical Systems Theory I: Modelling, State Space Analysis, Stability and Robustness Diederich Hinrichsen and Anthony J. Pritchard Springer-Verlag Heidelberg 2005 ISBN 3-540-44125-5 xiv. 804 pp. þ Figures 180 EUR e69.95 (net), £54.00, sFR 123.50, $79.95 (hardcover) Keywords Cybernetics, Systems theory, Modelling Review DOI 10.1108/03684920510614885

Book reviews

1659

This is a book that sets out the mathematical foundations of systems theory and as such presents the reader with the first volume of a self contained introduction to the field. This volume is written by Drs Diederich Hinrichsen of Bremen, Germany and Anthony Pritchard of the University of Warwick, UK. In particular, it is concerned with outlining the mathematical basis for systems theory that so many cyberneticians and systemists accept exists, but have no real insight into how it could be explained. The text takes up the challenge and aims to produce the author’s explanations in a comprehensive, complete and, of most importance, mathematically rigorous manner. To do this they have produced a contribution that is both an introductory text and also a sound reference source. It also includes an appendix that covers linear algebra; complex analysis; convolutions and transforms; and linear operators. The main contents covers the analysis of dynamical systems and provides examples and illustrations (some 200) which certainly help in the understanding of the mathematical constructions that are discussed. The important topics introduced include: . Mathematical models; . Introduction to state space theory; . Stability theory; . Perturbation theory; and . Uncertain spaces. A guide to readers who want to know whether their mathematical background suffices is given by the publishers who believe “that it is accessible to mathematics students after two years of mathematics and to graduate engineering students specialising in mathematical systems theory”. This is helpful, particularly to researchers who have followed a multidisciplinary path through both systems and cybernetics. Many readers from these fields may not wish to follow the rigorous mathematical approach to their

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1659-1661 q Emerald Group Publishing Limited 0368-492X

K 34,9/10

studies, but this book does give those engaged in systems research a good and clear introduction to the mathematical view of their field. It is also worth knowing that a second volume will be devoted to control and if the same comprehensive and detailed exposition ensues, it will be well worth considering. W.R. Howard

1660

Debugging by Thinking: A Multidisciplinary Approach Robert Charles Metzger Elsevier Academic Press 2005 ISBN 1-55558-307-5 600 pp. US $49.95 Keywords Cybernetics, Systems software, Computer software Review DOI 10.1108/03684920510614894 Cyberneticians and systemists will like this book not just because of the vitally important problems it tackles, but also because of the multidisciplinary way in which it has been written. First, the problems tackled concern the methodology of identifying and correcting software errors. This is a task that has been with us since the very beginning of computing and is one that has not received the attention it has deserved. So many programs, even today, are debugged in a most primitive way. Whilst computer scientists have attempted to both define and prove the validity of programs, the commercial world of software has often ignored their efforts. This book takes a positive step towards linking the relevant six disciplines which the author has identified in his approach to debugging by thinking. These are logic, mathematics, psychology, safety analysis, computer science and engineering. Many readers will already have decided, some of these disciplines are in themselves multidisciplinary and would have preferred the author to call his approach simply a cybernetic one. What is of the greatest interest is the manner in which the chosen disciplines are utilized. He uses mathematical problem-solving techniques in the manner of the literary computing specialists as well as the other five disciplines, in a novel way. All of these areas provide a building block in his strategy of looking for a method of defining his approach to both identifying and to correcting software bugs. It was essentially a systematic one and a multidisciplinary one in that it carefully married the methods of the chosen six into one coherent methodology. To prove the effectiveness of his thesis, the author provides real examples of the source code of programs that have identifiable software errors. These examples are written in both Java and Cþ þ . The real bugs are identified and the “thinking” process is described to illustrate how he resolved the problems as they were processed in the program. How this approach will workout in the real world of commercial software or indeed in any software endeavour where debugging is a normal part of the process of producing tried and tested products is another matter.

This book is worth reading because of the intricate way in which results in such a field as cognitive psychology or the processes of modern engineering can be harnessed in computer software development. To benefit from this text, readers will obviously need a sound working knowledge of the programming languages used in the source codes of the examples. D.M. Hutton

Art of Java Web Development N. Ford Manning Publications Co. 2003 ISBN 1-932394-06-c 582 pp. £40.50, US $44.95 (softbound) Keywords Computer software, Worldwide web, Java Review DOI 10.1108/03684920510614902 This is a book that traces the background to Java web development and also looks at other web development frameworks. Its approach is a historical one that works its way to the current Model2/MVC2 pattern of web based development. As the stages of progress unfold, the reader gets a feel of the system and a good understanding of the web frameworks that have emerged since. Although the reviewed book was published in 2003, it still merits attention, particularly as the background details are not likely to be interpreted in any other way and are, of course, historical by nature. What is likable about this text is the manner in which it is written. The author takes one example to illustrate the stage that is being introduced and describes the way in which the developed technology would have been implemented. This forms the first part of the book and subsequent parts two and three examine particular frameworks and the writer’s opinions about building web applications and the most suitable techniques to use. Obviously, the book is designed for reasonably knowledgeable Java web developers and is hardly readable by anyone without a good knowledge of software design and implementation. It does however, give an insight into the problems encountered in such software endeavours and the discussions on web development will be of much interest to cybernetics researchers who need to understand and perhaps implement such structures. For the systems software developer the book does provide references to a range of allied topics. It is also useful to know that the book provides the reader with the opportunity of downloading all the codes used from the internet, although this has the disadvantage of requiring some 153 MB. This is a useful and desirable book for many readers, but of course, these developments are most unlikely to standstill and no doubt a second updated edition is already being planned. D.M. Hutton

Book reviews

1661

K 34,9/10

1662

Book reports Encyclopedia of Knowledge Management David Schwartz (Ed.) Idea Group Refer Hersey, PA, USA 2006 (Prepublished July 2005) ISBN 1-59140-573-4 600 þ pp. US $275 (hardcover) (Pre-pub price US $235.00) Buying an encyclopedia is always a difficult task. Scholars have to ask so many questions about what is often a costly buy. The Idea Group Reference have taken a novel approach which may attract some readers. First, although the book is published in 2006, it is available in 2005 at a “pre-pub price” and this price is kept for one month after publication. Second, the publisher offers complimentary access to the electronic version for the life of the edition when the print copy is bought for the library. Other encyclopedia are also being offered that cover a range of topics that are of interest to cyberneticians and systemists. Whilst we all appreciate this, we are more concerned about the content of the text and whether it should grace our shelves or those of an easily accessed library or computer terminal. The editor of the text is a researcher from Bar-Ilan University (Israel) who has attempted the difficult task of collecting information in the wide-ranging area of knowledge management. He believes that the encyclopedia provides a broad basis for understanding the issues presented by the subject as well as the technologies, theories, applications that are now involved. There are, of course, a great number of opportunities presented by the field which is so closely linked to many other studies. The book also addresses some of the challenges now being faced by researchers and organisations by providing knowledge management data in a convenient and accessable way. Other encyclopedia on offer are concerned with: information science and technology; database technology and applications; data warehousing and mining; multimedia technology and networking. Further information is available at: www.idea-group-ref.com

Creating Web-based Laboratories C.C. Ko, Ben M. Chen and Jianping Chen Springer London, Berlin 2005 ISBN 1-85233-837-7 300 pp. US $89.95 (hardcover) Kybernetes Vol. 34 No. 9/10, 2005 pp. 1662-1664 q Emerald Group Publishing Limited 0368-492X

The first two authors are from the National University of Singapore and the last from the Nanyang Technological University Singapore. The book is in the series “Advanced information and Knowledge Processing”.

Creating a web-based laboratory is a novel idea although similar facilities have been produced using a much less sophisticated technology. This book tells us of one strategy for building such laboratories. The aim in this exercise is to create a laboratory that can be used by students to conduct experiments from a remote location. Thus it is suggested that, it can be achieved by setting up parameters and manipulating instruments through a web-based client interface. Computing Reviews believe that “this book is unquestionably very useful for students of science who have extensive computer programming backgrounds and professors of science who wish to implement web-based laboratories from scratch”. Further details about the book are available at: springeronline.com

To Talk of Many Things – An Autobiography Kathleen Ollerenshaw Manchester University Press Manchester 2004 ISBN 0-7190-6987-4 208 pp. £15.99 (hardback) Dame Kathleen Ollerenshaw has worked as a mathematician in industrial research and universities, served as a member and often chair of numerous national and local bodies, councils and voluntary associations. She was the first and, up to the present time, the only woman president of the Institute of Mathematics and its Applications. There are two good reasons we are told for reading this book. The first is that, it is in itself a fascinating account of her life and service to the community, and second her commitment to mathematics. Mathematics as we know is the basis of so many studies, not least cybernetics and systems of all descriptions. It is encouraging for those who work in this subject to read her autobiography, Dr Ollerenshaw was able to develop new interests and cope with her infirmities whilst also acting as a role model to others in academic and public life, is surely an inspiration to all.

Bioinformatics Technologies Yi-Ping Phoebe Chen (Ed.) Springer-Verlag Heidelberg 2005 ISBN 3-540-20873-9 396 pp. US $79.95 (hardcover) The editor from Deakin University, Melbourne, Australia, has embraced a wide selection of concepts and techniques which are well-established and from researchers who are well-known in their chosen area. These include: data mining; machine

Book reports

1663

K 34,9/10

1664

learning; database technologies; and visualization techniques. Problems such as protein data analysis and sequence, genome analysis and sequence databases are amongst the applications included in this compilation. Many of these will be of particular interest to cyberneticians, researching in these areas and who require a general view of Bioinformatics advances and applications.

From Being to Doing: The Origins of the Biology of Cognition Humberto R. Maturana and Bernhard Poerksen Carl-Auer Heidelberg 2004 (German original 2002) ISBN 3-89670-448-6; ISBN 1-932462-15-5 (USA and Canada) 208pp., 15 Illus. e27.95, US $37.95, sFr 48.00 (paperback) In USA and Canada, order on www.zeigtucker.com in Europe www.carl-auer.com The book carries on its back cover the following tribute by the late Heinz von Foerster: In the early 20th century physicists revolutionised the scientific view of the world. Today it is the biologists who are radically transforming our understanding of the processes of life and cognition. Probing the mysteries of the mind, they have been able to prove that, in the act of knowing, the observer and the observed, subject and object, are inextricably enmeshed. The world we live in is not independent from us; we literally bring it forth ourselves. One of the protagonists of this new kind of thinking is the internationally renowned Neurobiologist and Systems Theorist Humberto R. Maturana, who was interviewed for several weeks by Bernhard Poerksen, Journalist and Communication Scientist. In this book, they explore the limits of our cognitive powers, discuss the truth in perception, the biology of love, and give, all in all, an introduction to systemic thinking that is down to earth, imaginative and rich in anecdote. Wherever you read in this rewarding book, you will be enriched and stimulated in the mind.

C.J.H. Mann Book reviews and reports editor

News, conferences and technical reports University Professor Dr Alfred Locker, Emeritus, Institute of Theoretical Physics, Technical University of Vienna From his birth on 19 March 1922 to his death on 12 February 2005, Alfred Locker spent physically almost all his life in Vienna during the winters and at his hunting lodge in Schwarzau im Gebirg during the summers. Except for a research fellowship at the University of North Carolina at Chapel Hill, he only sporadically attended scientific events in Mexico, Canada, US, England and Germany. Yet his career ranged over many fields and in his mind he experienced the inside of a cell, the mentality of animals, major cultures of the world, its history and prehistory, the universe and beyond. I have met him on the Isle of Wight in 1979 during the first conference on self-reference, and instantly we became friends. From then on, we have met many times and discussed his manuscripts and ideas by mail or phone until shortly before his death. He was an intensely lively, complex, broadly learned, deeply intellectual, but primarily a lovable and stubborn man. With utter dedication, he tried to combine physicality and intellect, wide scholarship and varied worldly experience, a deep understanding of mathematics and science with a profoundly Austrian religiosity. He had intended to become a physician. Yet his study of medicine was interrupted by WWII and he spent it as a medical orderly at a military hospital, a couple of blocks from the house where previously Freud had his office. After the war he was first involved with the study of cells. From biomedicine his interests shifted later to biophysics, but already included an interest in Naturphilosophie, which at that time in Austria was intimately connected with biology. His doctorate from the University of Vienna in 1949 was in biophysics with a dissertation in zoology. His research appointments include Research Laboratory of the First Medical Clinic and Antibiotics Research Unit at the University of Vienna (1949-1960) and Unit of Physiology and Biophysics and Unit of Medical and Biological Radioprotectivity at the Institute of Biology, Austrian Reactor Center (1960-1969). In 1965 he became associated with the Institute of Theoretical Physics at the Technical University of Vienna where until his retirement he was a professor and head of the Department of Theoretical Biophysics. Already in the early sixties he had met Ludwig von Bertalanffy, one of the main proponents of the biological world view (later presented in the US as general systems theory) and eventually became his last student in Vienna. But only after his Chapel Hill experience did he turn his full attention to the ontological, logical and mathematical foundations of complex goal directed systems. He introduces his view “On the Ontological Foundations of the Theory of Systems” in a 1973 Festschrift for von Bertalanffy. His first Kybernetes publication (with Coulter) on “Recent Progress towards a Theory of Teleogenic Systems” appears in 1976. It defines his pivotal interest for the remainder of his life. From then on, more than one hundred of his publications revolve around this issue[1]. Although he appeared to shift his critical attention from general systems theory, cybernetics and autopoiesis through the theory of evolution to theological issues, his central concern did not waver. He wanted to reconcile, through the medium of systems

News, conferences and technical reports 1665

Kybernetes Vol. 34 No 9/10, 2005 pp. 1665-1667 q Emerald Group Publishing Limited 0368-492X

K 34,9/10

1666

theory, his acceptance of scientific biology and his unshakable belief in the immanent role of God in the universe. Into the last days of his life he attempted to integrate his view of nature and spirit, or “reality” and “truth”, within a “Trans-Classical Systems Theory”[2]. The last sentence in Professor Locker’s 1981 “Autopoiesis” article could serve as his epitaph: I conclude by expressing my conviction that unyieldingly withstanding naivete´ and unmasking the preposterous ostentation of scientism will result in a breakthrough toward the surcease of prejudices and regaining the regrettably lost franchise in the land of ideas.

Richard Jung Notes 1. Three representative statements of Alfred Locker’s positions are currently available on the internet. “Meta-theoretical Presuppositions for Autopoiesis – Self-Reference and ‘Autopoiesis’” is at www.vordenker.de/locker/metatheor-presupp-autopoiesis.pdf and “Evolution und ‘Evolutions’ Theorie in system und metatheoretischer Betrachtung” at www.vordenker.de/locker/evolution-theorie.pdf. “The Present Status of General System Theory, 25 Years after Ludwig von Bertalanffy’s Decease: A Critical Overview” is at www. systemsresearch.cz/bert2.pdf, where a photograph of Alfred Locker and a partial bibliography of his texts from the systems theory period is also available. 2. While several more recent versions are yet to be published or are in manuscript, the most recent, perhaps the only, published version is: “Recent Approach to Transclassical Systems-Theory. The Paradoxical Unity of Science with Non- and Super-Science”. In: Lasker G.E. (Ed.): Advances in Systems Res. & Cybernetics Vol. III, (1999) IIAS: Windsor/Ontario, pp. 11-16.

World Organisation of Systems and Cybernetics (WOSC) Announcements and news in brief. (1) The contributions presented at the WOSC 13th International Congress of Systems and Cybernetics held at Maribor, Slovenia (6-10 July 2005) which merited the Kybernetes Research Award and the “Highly Commended” awards will (subject to copyright) be published in Volume 335 Nos 1/10, 2006. Other selected papers presented at the Congress may also be included. In all cases the authors will have the opportunity of updating their contributions in line with the discussions and other presentation held at the event. Details will be announced in later issues. (2) Professor Robert Vallee, the President of WOSC participated in the Eighteenth International Conference on Systems Engineering (ICSEng05). The conference was held in Las Vegas, USA in August 2005. It is hoped that a conference report will be published in Vol. 35. (3) Dr Alex M. Andrew, the Director-General of WOSC is both the organisation’s and Kybernetes representative at the UK Cybernetics Society. (4) Professor B.H. Rudall (WOSC Vice-President) is in contact with members of the WOSC Norbert Wiener Institute to organise the Special Issues of WOSC’s official journal Kybernetes for 2006-2007. Two special double issues: “Cybernetics and Public Administration” and “Sociocybernetics” have been confirmed and are in preparation.

WOSC members and other readers of the journal are invited to suggest new topics and authors who may wish to be involved. 15th International Conference on Systems Science The “15th International Conference on Systems Science” was held, from 7-10 September 2004, in Wroclaw (Poland). It was organized by the Institute of Control and Systems Engineering of Wroclaw University of Technology (Director Professor Z. Bubnicki) and co-sponsored by the World Organisation of Systems and Cybernetics (WOSC) and the Committee of Automation and Robotics of the Polish Academy of Sciences (President Professor Z. Bubnicki). The general chairman was Professor Z. Bubnicki. Among members of the International Program Committee there were: Professors D.J.G. James (UK), G. Klir (USA), F. Pichler (Austria), G.P. Rao (India, UNESCO), R. Valle´e (France, President of WOSC), W.R. Wells (USA), and L.A. Zadeh (USA). . . The opening session, on Monday 7 September, was chaired by the Rector of Wroclaw University of Technology and by Professors Z. Bubnicki, D.J.G. James, A. Grzech (Vice-Rector, President of the Organizing Committee) and R. Valle´e who gave an address of which we quote the following paragraph: I wish a great success to the congress on behalf of the World Organisation of Systems and Cybernetics, in short WOSC, also known as Organisation Mondiale pour la Systemique et la Cyberne´tique, co-sponsor of this conference. . . The links between WOSC and the Institute of Control and Systems Engineering have already been stated, particularly with the presentation to Professor Bubnicki of an Honorary Fellowship in 2001 at the occasion of the WOSC congress in Pittsburgh. . . In the name of Professor J. Rose founder and Honorary Director of WOSC, Dr Alex Andrew Director-General, Professor Brian Rudall Vice-President and myself, I renew my wishes. Remembering the contributions of Poland to cybernetics (not forgetting Trentowski, 1843), economic cybernetics (Oskar Lange), praxiology, automation. . . I am sure that this event will be a great success.

The first plenary session (Monday), chaired by Professor W.R. Wells, was devoted to “MGST approach to information systems development” (Y. Takahara, Japan) and “Some issues about multirate control systems” (P. Albertos, J. Salt, Spain). The second (Wednesday), chaired by Professor L. Keviczky (Hungary), proposed a lecture on “Uncertain variables and their applications in uncertain systems” (Z. Bubnicki) and another about “Identification of continuous-time systems: direct or indirect?” (G.P. Rao, H. Gamier, India). On Thursday some members of the International Program Committee were invited by the Wroclaw Branch of the Polish Academy of Sciences, in the University Senate Hall, to a session on the perspectives and international cooperation in the field of information and systems science. The conference dinner was held at great hotel of Wroclaw. At the begining of the conference, three volumes of abstracts were distributed. Selected papers were to be published in “Systems Science”, a Polish journal in English, or in Kybernetes. The topics covered included: systems theory, identification, control theory, systems and control engineering, operation and manufacturing systems, uncertain systems and decision systems, knowledge engineering and intelligent systems, information systems, applications. Robert Valle´e

News, conferences and technical reports 1667

K 34,9/10

Announcements

1668

November-December 2005 MM ’05:2005 13th Annual ACM International Conference on Multimedia, Singapore, 1-12 November Contact: Tat-Seng Chua. Tel: þ 65-772-2505; E-mail: [email protected] SENSYS ’95 ’05: International Conference on Embedded Network Sensor Systems 2005 (ACM), San Deigo, California, USA, 2-4 November Contact: Jason Redi. Tel: 617-353-9575; E-mail: [email protected] ASE ’05: International Conference on Automated Software Engineering 2005, Long Beach, California, USA, 7-11 November Contact: Debra Brodbeck. Tel: 714-725-2260; E-mail: [email protected] SC ’05: High Performance Networking and Computing, Seattle, Washington, USA, 12-18 November Contact: Donna Baglio. Tel: þ 1-212-626-0606; E-mail: [email protected] ISESE 2005 4th International Symposium on Empirical Software Engineering (ACM-IEEE), Noosa Heads, Australia, 17-18 November Contact: web site: http://attend.it.uts.edu.au/isese2005/cfp.htm ICDM 2005 5th International Conference on Data Mining (IEEE), New Orleans, USA, 26-30 November Contact: web site: www.cacs.louisiana.edu/-icdm05/cfp.html WMTE 2005, 3rd International Conference on Wireless and Mobile Technologies in Education, Tokushima, Japan, 28-30 November Contact: web site: http://lttf.ieee.org/wmte20005/ HiPC 2005 12th International Conference on High-Performance Computing, Goa, India, 18-21 December Contact: web site: www.hipc.org/hipc2005/index.html Cryptography and Coding X – Institute of Mathematics and its Applications, Royal Agricultural College, Cirencester, UK, 19-21 December Contact: Lucy Nye, IMA. Tel: 01702 356104; E-mail: [email protected]; web site: www.ima.org.uk

Kybernetes Vol. 34 No. 9/10, 2005 p. 1668 q Emerald Group Publishing Limited 0368-492X

Special announcements

Special announcements

Call for papers and participation

IFSR International Congress, Kobe, Japan, 14-17 November 2005

1669

Scope Reminder A knowledge-based, technology-supported society is the key to solving current problems of mankind. The ability to understand and manage a complex, dynamic knowledge society of the future and the overall systemic framework supporting it is vital. Systems Sciences carry the promise of promoting the creation, management, exchange, integration, and application of knowledge by applying holistic/systemic paradigms and principles. Systems Sciences provide a basis for balancing the divergent needs and interests between individuals and society worldwide, between ecology and economy, between nations of various levels of development and between differing worldviews. They enable us to understand the conflict potential, to search for suitable policies, to harness complexity, and to provide adequate methods and technological tools for their resolution. The guiding themes of this conference are the new directions, challenges and roles for Systems Sciences and their potential beneficial impact on an emerging knowledge society. . Six Symposia are organized so as to present concrete research topics which include knowledge management, technology management, technology of information and communication networks, etc. to search for a way of achieving sustainable economic and ecologic development, which is an urgent need common to all human beings. . Symposium 7 analyses and provides the necessary foundations of Systems Sciences to support the demands of the new role of Systems Sciences as the integrating force between the various methodological, sociological and technological trends of the future. . The Workshop will integrate and summarize the outcome of the individual symposia and establish directions and research challenges for Systems Sciences. . A Panel discussion, interacting with all participants, will conclude the conference and discuss the findings of the conference. For details, see http://ifsr2005.jtbcom.co.jp/ Symposium-1: Technology Creation Based on Knowledge Science (chair: T. Kobayashi) Symposium-2: Creation of Agent Based Social Systems Sciences (chair: H. Deguchi) Symposium-3: Intelligent Information Technology and Applications (chair: H. Nakayama) Symposium-4: Meta-synthesis and Complex Systems (chair: X. Tang) Symposium-5: Data/Text Mining from Large Databases (chair: T. Ho) Symposium-6: Vision of Knowledge Civilization (chair: Andrzej Wierzbicki) Symposium-7: Foundations of the Systems Sciences (chair: Gary Metcalf) Workshop: New Roles of Systems Sciences in a Knowledge Society (chair: Matjaz Mulej) Panel discussion: New Roles of Systems Science in a Knowledge Society (chair: K. Kijima) Regular papers Authors were invited to submit an extended abstract (2-3 pages) to the symposia and the Workshop by 1 July 2005. Final papers (8 pages) were due by 1 October 2005.

Kybernetes Vol. 34 No. 9/10, 2005 pp. 1669-1670 q Emerald Group Publishing Limited 0368-492X

K 34,9/10

1670

MoDELS/UML 2005 8th International Conference on Model Driven Engineering Languages and Systems, Half Moon Resort, Montego Bay, Jamaica, 2-7 October 2005 The MoDELS/UML conference is devoted to the topic of model-driven development and covers both modeling languages and frameworks used to develop complex software systems. The MoDELS/UML conference series is both expansive and a redirection of previous UML conferences and will replace that series of conferences for 2005 and beyond. For more details, visit: www.modelsconference.org

VIS 2005 Vis 2005 – Visualization Conference, Minneapolis, MN, USA, 23-28 October 2005 Vis 2005 is the premier forum for visualization advances in science and engineering for academia, government and industry. This event brings together researchers and practitioners with a shared interest in techniques, tools and technology. Co-located with Vis 2005 is: Info 2005: IEEE Symposium on Information Visualization For more details visit the web site: www.infovis.org/infovis/2005 For further information on Vis 2005 visit: vis.computer.org/vis2005

File name{KYB}Articles/Kyb_33_1/Issues/robotica/Robotica.3d 14:49 – 20/12/04 – Total page(s) 1

International Journal of Information, Education and Research in Robotics and Artificial Intelligence Editor J. Rose, Hon. Director of the World Organisation of Systems and Cybernetics VISITING PROFESSOR, UNIVERSITY OF CENTRAL LANCASHIRE Robotica aims to endow robotics with an authoritative, competent and dedicated journal to serve industry, research and education. It provides an international forum for the multidisciplinary subject of robotics and helps to encourage development in this important field of automation. It covers the many aspects of robotics, including sensory perception, software (in the widest sense), particularly in regard to programming languages and links with master computers and CAD/CAM systems, control devices, the study of kinematics and dynamics involved in robot design, design of effectors and ancillary manipulators, problem solving, world model representation, development of relevant educational courses, training methods, analysis of managerial and social policy, economic and cost problems, and items of theoretical and practical interest. As well as original papers, the journal publishes research notes, book reviews, conference reports and letters to the editor. Robotica is of interest to academics, research workers and industry. In manufacturing industry the robot plays a fundamental part in increasing productivity, quality of products, and safety in hostile environments. In this era of advanced automation this publication is of primary importance to both theoretical researchers and practitioners. ‘Robotics will offer exciting solutions to some of the most challenging problems of modern society. Robotica has dedicated itself to this, and l wish the journal every success in the exciting years ahead.’ Lord Henry Chilver FEng FRS, (UK) ‘So often students will ask ‘‘What courses of study do you recommend as preparation for a career in robotics?’’ . . . read Robotica.’ J.F. Engelberger, Late President, Unimation Inc., Connecticut (USA) Volume 23, 2005, Parts 1-6 with Special Issues An official journal of the International Federation of Robotics If you would like further information on this prestigious journal please contact the publishers at the address below or on the Internet: http://www.cup.cam.ac.uk or http://www.cup.org

CAMBRIDGE UNIVERSITY PRESS The Edinburgh Building, Shaftesbury Road, Cambridge CB2 2RU, UK