Reconstructability analysis : theory and applications 9781845443917, 9780861769636

A novel many-valued decomposition within the framework of lossless reconstructability analysis (RA) is presented. In pre

189 18 3MB

English Pages 212 Year 2004

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Reconstructability analysis : theory and applications
 9781845443917, 9780861769636

Citation preview

kyb_cover_(i).qxd

5/24/04

2:09 PM

Page 1

Volume 33 Number 5/6 2004

ISBN 0-86176-963-5

ISSN 0368-492X

Kybernetes The International Journal of Systems & Cybernetics Reconstructability analysis: theory and applications Guest Editors: Martin Zwick, Guangfu Shu and Yi Lin

Selected as the official journal of the World Organisation of Systems and Cybernetics

www.emeraldinsight.com

Kybernetes

ISSN 0368-492X

The International Journal of Systems & Cybernetics

Volume 33 Number 5/6 2004

Reconstructability analysis: theory and applications Guest Editors Martin Zwick, Guangfu Shu and Yi Lin

Access this journal online ________________________

867

Editorial advisory board _________________________

868

Abstracts and keywords _________________________

869

Preface _________________________________________

873

Editorial ________________________________________

874

An overview of reconstructability analysis Martin Zwick_________________________________________________

877

Modified reconstructability analysis for many-valued functions and relations Anas N. Al-Rabadi and Martin Zwick _____________________________

906

Reversible modified reconstructability analysis of Boolean circuits and its quantum computation Anas N. Al-Rabadi and Martin Zwick _____________________________

Access this journal electronically The current and past volumes of this journal are available at:

www.emeraldinsight.com/ISSN 0368-492X.htm You can also search over 100 additional Emerald journals in Emerald Fulltext at:

www.emeraldinsight.com/ft See page following contents for full details of what your access includes.

921

CONTENTS

CONTENTS continued

A comparison of modified reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions Anas N. Al-Rabadi, Marek Perkowski and Martin Zwick ______________

933

Multi-level decomposition of probabilistic relations Stanislaw Grygiel, Martin Zwick and Marek Perkowski _______________

948

The k-systems glitch: granulation of predictor variables Susanne S. Hoeppner and Gary P. Shaffer _________________________

962

Directed extended dependency analysis for data mining Thaddeus T. Shannon and Martin Zwick __________________________

973

Instant modelling and data-knowledge processing by reconstructability analysis Guangfu Shu_________________________________________________

984

Application of reconstructability analysis in system structure Pengtao Wang and Changyun Yu ________________________________

992

A software architecture for reconstructability analysis Kenneth Willett and Martin Zwick________________________________

997

Forecast entropy W. Yao, C. Essex, P. Yu and M. Davison __________________________

1009

The forecast model of system reconstructability analysis Changyun Yu and Pengtao Wang ________________________________

1016

Construction of main sequence of gene based on ‘‘method of factor reconstruction analysis’’ Zhang Zhihong, Wang Pengtao, Liu Huaqing and Shu Guangfu _______

1020

Reconstructability analysis with Fourier transforms Martin Zwick_________________________________________________

1026

State-based reconstructability analysis Martin Zwick and Michael S. Johnson _____________________________

1041

Reconstructability analysis detection of optimal gene order in genetic algorithms Martin Zwick and Stephen Shervais ______________________________

1053

Book reviews____________________________________

1063

Book reports ____________________________________

1069

Announcements _________________________________

1073

Special announcements __________________________

1075

www.emeraldinsight.com/k.htm As a subscriber to this journal, you can benefit from instant, electronic access to this title via Emerald Fulltext. Your access includes a variety of features that increase the value of your journal subscription.

How to access this journal electronically To benefit from electronic access to this journal you first need to register via the Internet. Registration is simple and full instructions are available online at www.emeraldinsight.com/rpsv/librariantoolkit/emeraldadmin Once registration is completed, your institution will have instant access to all articles through the journal’s Table of Contents page at www.emeraldinsight.com/0368-492X.htm More information about the journal is also available at www.emeraldinsight.com/k.htm Our liberal institution-wide licence allows everyone within your institution to access your journal electronically, making your subscription more cost effective. Our Web site has been designed to provide you with a comprehensive, simple system that needs only minimum administration. Access is available via IP authentication or username and password.

Additional complementary services available Your access includes a variety of features that add to the functionality and value of your journal subscription: E-mail alert services These services allow you to be kept up to date with the latest additions to the journal via e-mail, as soon as new material enters the database. Further information about the services available can be found at www.emeraldinsight.com/ usertoolkit/emailalerts Research register A web-based research forum that provides insider information on research activity world-wide located at www.emeraldinsight.com/researchregister You can also register your research activity here. User services Comprehensive librarian and user toolkits have been created to help you get the most from your journal subscription. For further information about what is available visit www.emeraldinsight.com/usagetoolkit

Key features of Emerald electronic journals

Choice of access

Automatic permission to make up to 25 copies of individual articles This facility can be used for training purposes, course notes, seminars etc. This only applies to articles of which Emerald owns copyright. For further details visit www.emeraldinsight.com/copyright

Electronic access to this journal is available via a number of channels. Our Web site www.emeraldinsight.com is the recommended means of electronic access, as it provides fully searchable and value added access to the complete content of the journal. However, you can also access and search the article content of this journal through the following journal delivery services: EBSCOHost Electronic Journals Service ejournals.ebsco.com Huber E-Journals e-journals.hanshuber.com/english/index.htm Informatics J-Gate www.J-gate.informindia.co.in www.ingenta.com Minerva Electronic Online Services www.minerva.at OCLC FirstSearch www.oclc.org/firstsearch SilverLinker www.ovid.com SwetsWise www.swetswise.com TDnet www.tdnet.com

Online publishing and archiving As well as current volumes of the journal, you can also gain access to past volumes on the internet via Emerald Fulltext. Archives go back to 1994 and abstracts back to 1989. You can browse or search the database for relevant articles. Non-article content Material in our journals such as product information, industry trends, company news, conferences, etc. is available online and can be accessed by users. Key readings This feature provides abstracts of related articles chosen by the journal editor, selected to provide readers with current awareness of interesting articles from other publications in the field. Reference linking Direct links from the journal article references to abstracts of the most influential articles cited. Where possible, this link is to the full text of the article. E-mail an article Allows users to e-mail links to relevant and interesting articles to another computer for later use, reference or printing purposes.

Emerald Customer Support For customer support and technical help contact: E-mail [email protected] Web www.emeraldinsight.com/customercharter Tel +44 (0) 1274 785278 Fax +44 (0) 1274 785204

K 33,5/6

868

Kybernetes Vol. 33 No. 5/6, 2004 p. 868 # Emerald Group Publishing Limited 0368-492X

EDITORIAL ADVISORY BOARD A. Bensoussan President of INRIA, France V. Chavchanidze Institute of Cybernetics, Tbilisi University, Georgia A.B. Engel IMECC-Unicamp, Universidad Estadual de Campinas, Brazil R.L. Flood Hull University, UK F. Geyer The Netherlands Universities Institute for Co-ordination of Research in Social Sciences, Amsterdam, The Netherlands A. Ghosal Honorary Fellow, World Organisation of Systems and Cybernetics, New Delhi, India R. Glanville CybernEthics Research, UK R.W. Grubbstro¨m Linko¨ping University, Sweden Chen Hanfu Institute of Systems Science, Academia Sinica, People’s Republic of China G.J. Klir State University of New York, USA Yi Lin International Institute for General Systems Studies Inc., USA

K.E. McKee IIT Research Institute, Chicago, IL, USA M. Ma˘nescu Academician Professor, Bucharest, Romania M. Mansour Swiss Federal Institute of Technology, Switzerland K.S. Narendra Yale University, New Haven, CT, USA C.V. Negoita City University of New York, USA W. Pearlman Technion Haifa, Israel A. Raouf Pro-Rector, Ghulam Ishaq Khan (GIK) Institute of Engineering Sciences & Technology, Topi, Pakistan Y. Sawaragi Kyoto University, Japan B. Scott Cranfield University, Royal Military College of Science, Swindon, UK D.J. Stewart Human Factors Research, UK I.A. Ushakov Moscow, Russia J. van der Zouwen Free University, Amsterdam, The Netherlands

An overview of reconstructability analysis Martin Zwick Keywords Cybernetics, Information theory, Systems theory, Modelling This paper is an overview of reconstructability analysis (RA), an approach to discrete multivariate modeling developed in the systems community. RA includes set-theoretic modeling of relations and information-theoretic modeling of frequency and probability distribution. It thus encompasses both statistical and nonstatistical problems. It overlaps with logic design and machine learning in engineering and with log-linear modeling in the social sciences. Its generality gives it considerable potential for knowledge representation and data mining.

Modified reconstructability analysis for many-valued functions and relations Anas N. Al-Rabadi and Martin Zwick Keywords Data analysis, Cybernetics, Boolean functions A novel many-valued decomposition within the framework of lossless reconstructability analysis (RA) is presented. In previous work, modified reconstructability analysis (MRA) was applied to Boolean functions, where it was shown that most Boolean functions not decomposable using conventional reconstructability analysis (CRA) are decomposable using MRA. Also, it was previously shown that whenever decomposition exists in both MRA and CRA, MRA yields simpler or equal complexity decompositions. In this paper, MRA is extended to many-valued logic functions, and logic structures that correspond to such decomposition are developed. It is shown that many-valued MRA can decompose manyvalued functions when CRA fails to do so. Since real-life data are often many-valued, this new decomposition can be useful for machine learning and data mining. Many-valued MRA can also be applied for the decomposition of relations.

Reversible modified reconstructability analysis of Boolean circuits and its quantum computation Anas N. Al-Rabadi and Martin Zwick Keywords Cybernetics, Boolean functions, Logic Modified reconstructability analysis (MRA) can be realized reversibly by utilizing Boolean reversible (3,3) logic gates that are universal in two arguments. The quantum computation of the reversible MRA circuits is also introduced. The reversible MRA transformations are given a quantum form by using the normal matrix representation of such gates. The MRA-based quantum decomposition may play an important role in the synthesis of logic structures using future technologies that consume less power and occupy less space.

A comparison of modified reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions Anas N. Al-Rabadi, Marek Perkowski and Martin Zwick Keywords Cybernetics, Boolean functions, Complexity theory Modified reconstructability analysis (MRA), a novel decomposition technique within the framework of set-theoretic (crisp possibilistic) reconstructability analysis, is applied to threevariable NPN-classified Boolean functions. MRA is superior to conventional reconstructability analysis, i.e. it decomposes more NPN functions. MRA is compared to Ashenhurst-Curtis (AC) decomposition using two different complexity measures: logfunctionality, a measure suitable for machine learning, and the count of the total number of two-input gates, a measure suitable for circuit design. MRA is superior to AC using the first of these measures, and is comparable to, but different from AC, using the second.

Abstracts and keywords

869

Kybernetes Vol. 33 No. 5/6, 2004 Abstracts and keywords # Emerald Group Publishing Limited 0368-492X

K 33,5/6

870

Multi-level decomposition of probabilistic relations Stanislaw Grygiel, Martin Zwick and Marek Perkowski Keywords Cybernetics, Complexity theory, Probability calculations Two methods of decomposition of probabilistic relations are presented in this paper. They consist of splitting relations (blocks) into pairs of smaller blocks related to each other by new variables generated in such a way so as to minimize a cost function which depends on the size and structure of the result. The decomposition is repeated iteratively until a stopping criterion is met. Topology and contents of the resulting structure develop dynamically in the decomposition process and reflect relationships hidden in the data. The k-systems glitch: granulation of predictor variables Susanne S. Hoeppner and Gary P. Shaffer Keywords Cybernetics, Cluster analysis, Data reduction, Dynamics Ecosystem behavior is complex and may be controlled by many factors that change in space and time. Consequently, when exploring system functions such as ecosystem ‘‘health’’, scientists often measure dozens of variables and attempt to model the behavior of interest using combinations of variables and their potential interactions. This methodology, using parametric or nonparametric models, is often flawed because ecosystems are controlled by events, not variables, and events are comprised of (often tiny) pieces of variable combinations (states and substates). Most events are controlled by relatively few variables (#4) that may be modulated by several others, thereby creating event distributions rather than point estimates. These event distributions may be thought of as comprising a set of fuzzy rules that could be used to drive simulation models. The problem with traditional approaches to modeling is that predictor variables are dealt with in total, except for interactions, which themselves must be static. In reality, the ‘‘low’’ piece of one variable may influence a particular event differently than another, depending on how pieces of other variables

are shaping the event, as demonstrated by the k-systems state model of algal productivity. A swamp restoration example is used to demonstrate the changing faces of predictor variables with respect to influence on the system function, depending on particular states. The k-systems analysis can be useful in finding potent events, even when region size is very small. However, small region sizes are the result of using many variables and/or many states and substates, which creates a high probability of extracting falsely-potent events by chance alone. Furthermore, current methods of granulating predictor variables are inappropriate because the information in the predictor variables rather than that of the system function is used to form clusters. What is needed is an iterative algorithm that granulates the predictor variables based on the information in the system function. In most ecological scenarios, few predictor variables could be granulated to two or three categories with little loss of predictive potential.

Directed extended dependency analysis for data mining Thaddeus T. Shannon and Martin Zwick Keywords Cybernetics, Programming and algorithm theory, Searching Extended dependency analysis (EDA) is a heuristic search technique for finding significant relationships between nominal variables in large data sets. The directed version of EDA searches for maximally predictive sets of independent variables with respect to a target dependent variable. The original implementation of EDA was an extension of reconstructability analysis. Our new implementation adds a variety of statistical significance tests at each decision point that allow the user to tailor the algorithm to a particular objective. It also utilizes data structures appropriate for the sparse data sets customary in contemporary data mining problems. Two examples that illustrate different approaches to assessing model quality tests are given in this paper.

Instant modelling and data-knowledge processing by reconstructability analysis Guangfu Shu Keywords Cybernetics, Information, Modelling This paper first reviews factor reconstructability analysis (RA) modelling completely by data. Gives a description of levelled variable factor reconstruction analysis model generation process and its further extension to forecasting, evaluation and optimisation. It introduces RA modelling with multi-variety information and knowledge. From generation of mixed variable RA models to generation of models with both data and knowledge, the paper gives a related table and a figure on ‘‘Information flow of reconstructability analysis with multi-variety information and knowledge’’ which included database, RA relational knowledge base, and interfaces for input of experts’ knowledge. Finally, it gives some examples and prospects.

linear modeling, generally provide a fixed set of functions. Such packages are suitable for end-users applying RA in various domains, but do not provide a platform for research into the RA methods themselves. A new software system, Occam3, is being developed which is intended to address three goals which often conflict with one another to provide: a general and flexible infrastructure for experimentation with RA methods and algorithms; an easily-configured system allowing methods to be combined in novel ways, without requiring deep software expertise; and a system which can be easily utilized by domain researchers who are not computer specialists. Meeting these goals has led to an architecture which strictly separates functions into three layers: the core, which provides representation of data sets, relations, and models; the management layer, which provides extensible objects for development of new algorithms; and the script layer, which allows the other facilities to be combined in novel ways to address a particular domain analysis problem.

Application of reconstructability analysis in system structure Pengtao Wang and Changyun Yu Keywords Cybernetics, Optimization techniques, Systems and control theory In the development research of talent resource, the most important topics of talent resource forecasting and optimization are the structure of talent resource, requirement number and talent quality. This paper establishes factor reconstruction analysis forecasting and talent quality model on the basis of system reconstruction analysis and ensures most effective factor level in the system. The method is based on works of G.J. Klir, B. Jones, and performs dynamic analysis of example ration.

Forecast entropy W. Yao, C. Essex, P. Yu and M. Davison Keywords Cybernetics, Statistical forecasting, Data analysis A technique, called forecast entropy, is proposed to measure the difficulty of forecasting data from an observed time series. When the series is chaotic, this technique can also determine the delay and embedding dimension used in reconstructing an attractor. An ideal random system is defined. An observed time series from the Lorenz system is used to show the results.

A software architecture for reconstructability analysis Kenneth Willett and Martin Zwick Keywords Cybernetics, Information theory, Computer software Software packages for reconstructability analysis (RA), as well as for related log

The forecast model of system reconstructability analysis Changyun Yu and Pengtao Wang Keywords Cybernetics, Statistical forecasting, Data analysis In talent forecasting and optimization, talent structure, demand and quality are very important. In this paper, we analyze the use of factor reconstruction analysis and elastic coefficient forecast method in talent forecasting, and the model is built-up about talent structure, demand and quality of

Abstracts and keywords

871

K 33,5/6

872

scientist and technicians for the electronic industry.

Construction of main sequence of gene based on ‘‘method of factor reconstruction analysis’’ Zhang Zhihong, Wang Pengtao, Liu Huaqing and Shu Guangfu Keywords Cybernetics, Numerical analysis, Genetics The decipher, analysis and research of gene sequence is the most challenging problem that scientists all over the world are trying their best to solve. This paper introduces the application of factor reconstruction analysis in this field, its process and steps in using this kind of method. The resulting computation and analysis is presented.

Reconstructability analysis with Fourier transforms Martin Zwick Keywords Cybernetics, Fourier transforms, Information theory, Modelling Fourier methods used in two- and threedimensional image reconstruction can be used also in reconstructability analysis (RA). These methods maximize a variance-type measure instead of information-theoretic uncertainty, but the two measures are roughly collinear and the Fourier approach yields results close to that of standard RA. The Fourier method, however, does not require iterative calculations for models with loops. Moreover, the error in Fourier RA models can be assessed without actually generating the full probability distributions of the models; calculations scale with the size of the data rather than the state space. Statebased modeling using the Fourier approach is also readily implemented. Fourier methods may thus enhance the power of RA for data analysis and data mining.

State-based reconstructability analysis Martin Zwick and Michael S. Johnson Keywords Cybernetics, Modelling, Numerical analysis Reconstructability analysis (RA) is a method for detecting and analyzing the structure of multivariate categorical data. While Jones and his colleagues extended the original variablebased formulation of RA to encompass models defined in terms of system states, their focus was the analysis and approximation of real-valued functions. In this paper, we separate two ideas that Jones had merged together: the ‘‘g to k’’ transformation and state-based modeling. We relate the idea of state-based modeling to established variable-based RA concepts and methods, including structure lattices, search strategies, metrics of model quality, and the statistical evaluation of model fit for analyses based on sample data. We also discuss the interpretation of state-based modeling results for both neutral and directed systems, and address the practical question of how state-based approaches can be used in conjunction with established variable-based methods.

Reconstructability analysis detection of optimal gene order in genetic algorithms Martin Zwick and Stephen Shervais Keywords Cybernetics, Programming and algorithm theory, Optimization techniques The building block hypothesis implies that genetic algorithm efficiency will be improved if sets of genes that improve fitness through epistatic interaction are near to one another on the chromosome. We demonstrate this effect with a simple problem, and show that information-theoretic reconstructability analysis can be used to decide on optimal gene ordering.

Preface Special issue – Reconstructability analysis: theory and applications Guest Editors: Martin Zwick, Guangfu Shu and Yi Lin

Preface

873

As part of our introduction to new and significant topics within the field of systems and cybernetics we have devoted this special issue of the journal to reconstructability analysis (RA). In historical terms this is a comparatively recent study which emanated from the previous research in cybernetics during 1960-1980. We are particularly grateful to our guest editors who themselves have played an important part in reviving the subject. Martin Zwick (USA), Guangfu Shu (People’s Republic of China) and Yi Lin (USA) have completed this work with much care and skill. In many ways it links the important researches that are being carried out in the USA with those in the People’s Republic of China. It also serves to remind us of the pioneering work of the distinguished cybernetician W.R. Ashby, although he could not, of course, like so many innovators of his generation, predict the value of RA and its applications to so many current researches. The Editorial Advisory Board of this journal believes that this collection of papers will be much appreciated by the systems and cybernetics community and that it will be of lasting value to RA development. It will play an important part in the revival of interest in this worthwhile and promising research area. Selected regular journal sections are also published at the end of this special issues. Brian H. Rudall Editor-in-Chief

Kybernetes Vol. 33 No. 5/6, 2004 p. 873 q Emerald Group Publishing Limited 0368-492X

K 33,5/6

874

Kybernetes Vol. 33 No. 5/6, 2004 pp. 874-876 q Emerald Group Publishing Limited 0368-492X

Editorial Reconstructability analysis (RA) dates back to the pioneering work of Ashby in the mid-1960s. In the 1970s and 1980s, RA was the subject of very active research in the systems community. It receded for a time as a focus of activity, but the special issue of the International Journal of General Systems (IJGS) in 1996 on the “General Systems Problem Solver” and the special IJGS issue in 2000 on “Reconstructability Analysis in China” marked the renewal of interest in this area. The current volume is part of this resurgence of activity. It collects together papers from the group at Portland State University (Portland, Oregon, USA), from the Chinese RA workers, and from other investigators. It is evident from these papers that RA continues to be a productive research area, that it has considerable value for data analysis and data-mining, and that this value has yet to be fully explored and exploited. This special issue starts with an overview of RA theory and methodology by Professor Martin Zwick. This overview presents the basic ideas of probabilistic (“information-theoretic”) and crisp-possibilistic (“set-theoretic”) RA, illustrating the first using health care data, and the second using mappings of elementary cellular automata. In the paper, “A comparison of modified reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions”, Anas Al-Rabadi, Marek Perkowski and Martin Zwick present an enhanced crisp possibilistic RA and shows that it yields better decompositions for simple binary functions than a standard logic decomposition procedure. In the paper, “Modified reconstructability analysis for many-valued logic functions and relations, Anas Al-Rabadi and Martin Zwick extends the method presented in the previous paper to multi-valued relations and functions, and, in “Reversible modified reconstructability analysis of Boolean circuits and its quantum computation”, they show how it can be used in reversible and quantum computation. In “Multi-level decomposition of probabilistic relations”, Stanislaw Grygiel, Martin Zwick and Marek Perkowski demonstrate how multi-level latent variable methods, which are common in machine learning and logic design but uncommon in the RA literature, can be applied to probabilistic RA. Susanne Hoeppner and Gary Shaffer, in “The K-systems glitch: granulation of predictor variables”, apply k-systems analysis to ecological modeling, demonstrating the power of a small number of factors to capture the essential behavior of the system and making apparent the critical nature of the granulation (binning) procedure used to discretize the systems function. In the next paper, “Directed extended dependency analysis for data mining”, Thaddeus Shannon and Marin Zwick report an improved implementation of the heuristic techniques proposed

by Conant for loopless models, and apply these techniques to time-series analysis and pattern recognition. In the paper, “Instant modeling and data-knowledge processing by reconstructability analysis”, Professor Guangfu Shu introduces RA modeling with multi-variety information and knowledge after reviewing data-driving factor RA modeling and the leveled variable factor RA modeling generation process. In the tenth article, “Application of reconstructability analysis in system structure”, Pengtao Wang and Changyun Yu establish a factor RA forecasting and talent quality model for development research on talent resource. Kenneth Willett and Martin Zwick, in “A software architecture for reconstructability analysis”, describe the architecture of the RA software package named OCCAM (“organizational complexity, computation, and modeling”); this package is user-friendly and web accessible, and is the primary research and applications platform for the Portland group. W. Yao, C. Essex, P. Yu and M. Davison, “Forecast entropy”, apply an entropy measure to the analysis of chaotic time-series data, and show that this measure captures central dimension and delay parameters of the underlying attractor. In the 13th paper, “The forecast model of system reconstructability analysis”, Professors Changyun Yu and Pengtao Wang analyze the use of factor RA and the elastic coefficient forecast method in talent forecasting. They use this approach to study talent structure, demand and quality of trained human resources in the electronic industry. Zhihong Zhang, Pengtao Wang, Huaqing Liu, and Guangfu Shu, in “Construction of main sequence of gene based on ‘method of factor reconstructability analysis’”, introduce factor RA to the study of gene sequences in the hope that this method can help us decipher, analyse and understand genomic information. In the article, “Reconstructability analysis with Fourier transform”, Martin Zwick develops an approach to probabilistic RA utilizing Fourier transforms, which bypasses the need for iterative methods for models with loops, and requires computation which scales only with the data instead of the state space. Martin Zwick and Michael Johnson in the paper “State-based reconstructability analysis” shows that the state-based modeling idea originally introduced by Bush Jones in his “k-systems analysis” can be separated from its initial use for function approximation and integrated fully with probabilistic and statistical RA. In the final paper, Martin Zwick and Stephen Shervais demonstrate the use of RA as a preprocessor for genetic algorithms to determine the optimal order for the variables on the GA genome; this utilization of RA calls to mind other uses of RA as a preprocessor for neural nets.

Editorial

875

K 33,5/6

876

As the guest editors of this special issue, we would like to express our gratitude to Professor Brian H. Rudall, Editor of the prestigious Kybernetes: The International Journal of Systems and Cybernetics, for providing this valuable opportunity and forum for us to publish these articles together as a special issue. Martin Zwick Systems Science PhD Program, Portland State University, Portland, OR, USA E-mail: [email protected] Guangfu Shu Institute of Systems Science, Academy of Mathematical and Systems Sciences, Chinese Academy of Sciences, Beijing, People’s Republic of China E-mail: [email protected] Yi Lin International Institute for General Systems Studies, Inc., 23 Kings Lane, Grove City, PA, USA E-mail: [email protected]

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

An overview of reconstructability analysis Martin Zwick Portland State University, Portland, Oregon, USA

An overview

877

Keywords Cybernetics, Information theory, Systems theory, Modelling Abstract This paper is an overview of reconstructability analysis (RA), an approach to discrete multivariate modeling developed in the systems community. RA includes set-theoretic modeling of relations and information-theoretic modeling of frequency and probability distribution. It thus encompasses both statistical and nonstatistical problems. It overlaps with logic design and machine learning in engineering and with log-linear modeling in the social sciences. Its generality gives it considerable potential for knowledge representation and data mining.

1. Introduction This paper is an overview of reconstructability analysis (RA), a discrete multivariate modeling methodology developed in the systems literature; an earlier version of this tutorial is Zwick (2001). RA was derived from Ashby (1964), and was developed by Broekstra, Cavallo, Cellier Conant, Jones, Klir, Krippendorff, and others (Klir, 1986, 1996). RA resembles and partially overlaps log-line (LL) statistical methods used in the social sciences (Bishop et al., 1978; Knoke and Burke, 1980). RA also resembles and overlaps methods used in logic design and machine learning (LDL) in electrical and computer engineering (e.g. Perkowski et al., 1997). Applications of RA, like those of LL and LDL modeling, are diverse, including time-series analysis, classification, decomposition, compression, pattern recognition, prediction, control, and decision analysis. RA involves the set-theoretic modeling of relations and mappings and the information-theoretic modeling of probability/frequency distributions. Its different uses can be categorized using the dimensions of variable, system, data, problem, and method-types shown in Table I. These will now be briefly discussed. Section 2 explains RA in more detail. Section 3 gives examples, Section 4 discusses software, and Section 5 offers a concluding discussion. 1.1 Variable-type: nominal, ordinal, and quantitative RA applies to multivariate data involving nominal variables or quantitative variables which are converted into nominal variables by being discretized. The author wish to thank Pete Catching, Michael Johnson, Roberto Santiago, Tad Shannon, and Ken Willett for helpful comments on the manuscript. Special appreciation is due to Marek Perkowski for inviting him to present this paper at the 2000 LDL Conference to introduce RA methodology to the LDL community, and for many stimulating conversations on decomposition techniques and other subjects.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 877-905 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410533958

K 33,5/6

878

Table I. Aspects of RA

Variable type

Nominal (discrete) [binary/multi-valued ] Ordinal (discrete) Quantitative (typically continuous) System type Directed systems (has inputs and outputs) – deterministic vs non-deterministic Neutral systems (no input/output distinction) Data type Information theoretic RA (freq./prob. distribution) Set-theoretic (set-theoretic relation/function) Problem type Reconstruction (decomposition) – confirmatory vs exploratory (data analysis/mining) Identification (composition) Method type Variable-based modeling (VBM) State-based modeling (SBM) Latent variable-based modeling (LVBM) Note: Prototypical RA task is shown in italics.

Variables need not be binary (dichotomous) but can be multi-valued. Nominal variables, whose states are discrete and unordered, are the most general type of variable, and so methods which apply to them encompass ordinal and quantitative variables as well. Continuous quantitative variables can be discretized either by quantization (non-overlapping binning intervals) or by fuzzification (Cellier et al., 1995; Zadeh, 1965), which is less sensitive to the boundaries of the bins. Although discretization loses information, this loss is offset by the fact that RA can detect nonlinearities and interaction effects which might be missed by standard methods. Moreover, it is not necessary to hypothesize specific nonlinear and interaction effects to detect their existence. The subject of discretization is mentioned only to emphasize the generality of nominal variable methods; it is outside the scope of this paper. 1.2 System-type: directed vs neutral To relate RA to a familiar LDL problem, consider the task of decomposing a logic function Z ¼ gðA; B; CÞ; where variables are either binary or multivalued. In RA terminology, this is a directed system, since inputs and outputs (“independent variables” and “dependent variables”) are distinguished. Directed systems are further classified as deterministic or stochastic. While most RA applications involve predictive, dynamic, or causal (hence directed) relationships between variables, sometimes variables have equal status; these systems are called neutral. RA can be applied to directed – both deterministic and stochastic – and neutral systems. By contrast, in LL modeling, stochastic systems are usually the focus. In LDL modeling, deterministic directed systems are the rule, and neutral systems are rarely considered.

1.3 Data-type: information-theoretic and set-theoretic RA RA has two versions: a set-theoretic (here called SRA), or more precisely, a “crisp possibilistic,” version which applies to set-theoretic relations and mappings, and an information-theoretic (here called IRA) “probabilistic” version which applies to frequency (and probability) distributions (Conant, 1981; Klir, 1985; Krippendorff, 1986). IRA can also be applied to quantitative functions of nominal variables by rescaling these functions so that they can be treated as probability distributions ( Jones, 1985a). SRA and IRA are similar in many respects, and together constitute a coherent framework. Moreover, probabilistic and crisp possibilistic analyses are encompassed within a “generalized information theory” (Klir and Wierman, 1998), which includes also fuzzy possibilistic and probabilistic distributions. The same model structures are considered in both IRA and SRA. Let ABC represent a set-theoretic relation or mapping or a probability or frequency distribution for a three-variable system, with projections AB, AC, and BC, and A, B and C. Define a structure as a nonredundant set of projections. If ABC is the data, the possible model structures are shown in Figure 1. At the top of the lattice is the data, called the “saturated model”. At the bottom is A:B:C, called the “independence model”. (In IRA, the bottom model may alternatively chosen to be the uniform distribution.) Figure 1 shows that the lattice of structures for a directed three-variable system (with two inputs and one output) is a sublattice of the lattice for neutral systems. For directed systems with output C, the independence model is AB:C, not A:B:C, and only the five shaded structures need to be considered. Each of these five structures contains an AB component (relation or distribution). A directed system model always has one component which collects together all inputs, allowing for but ignoring the possible presence of constraint among them. Every other component includes at least one output. Directed system models can thus be characterized by their number of predicting components. For inputs A, B, and output C, model AB:AC has one predicting component, AC. (We do not here employ a notation which explicitly shows directedness, e.g. A ! C; any relation or distribution written as a string of letters may be either neutral or directed.) Model AB:AC:BC has two predicting components,

An overview

879

Figure 1. Lattice of specific structures for a three-variable neutral system. The shaded sublattice is for a directed system, with inputs A and B and output C

K 33,5/6

880

which are “independent in a “maximum uncertainty” sense, to be described later. Only in model ABC do A and B interact in their joint effect on C. The most commonly used version of RA is IRA. Here the problem is typically the decomposition of frequency or probability distributions, where RA does statistical analysis. This is the main subject of this paper. Consider a frequency distribution f ðA; B; C; Z Þ for a directed system, where A, B, and C are inputs Z is an output. RA decomposes such distributions into projections, such as f 1 ðA; B; Z Þ and f 2 ðB; C; Z Þ; and models are assessed for statistical significance, usually with the chi-square distribution. This use of RA overlaps considerably with LL modeling but has no parallel in LDL, where “statistical” considerations can arise in that functions or relations may be partially specified, e.g. due to sparse sampling. Where IRA and LL overlap, they are equivalent, but each has distinctive strengths. The LL literature is more advanced statistically, includes latent variable techniques (discussed below), and offers methods to analyze ordinal variables. Well tested LL software exists, (e.g. in SPSS and SAS). On the other hand, in IRA, graph-theoretic methods are used to define explicitly various lattices of possible models and to suggest heuristic techniques to search these lattices. RA makes extensive use of the uncertainty (Shannon entropy) measure, which is conceptually transparent because of its similarity to variance, and includes innovations like state-based modeling (discussed below), absent in the LL literature. However, the RA and LL communities have been only dimly aware of – and have not benefitted much from – each other’s existence, despite early work which linked the two (Kullbacks, 1968; Ku and Kullback, 1959). While IRA is statistical in its overlap with LL, it also includes nonstatistical applications. For example, the k-systems methodology of Jones (1985a) is used primarily for function approximation and compression. IRA can also be used to analyze set-theoretic relations and functions (this is done, for example, in the analyses of cellular automata reported later) SRA, in contrast, is completely and inherently nonstatistical. It is the natural RA approach to set-theoretic relations and functions. SRA here overlaps with LDL. While it appears to be different from any particular LDL technique, it resembles LDL methods which decompose functions into generalized (arbitrary) as opposed to specific components (like “and” and “or” gates). RA thus bridges two very disparate fields: LL modeling in the social sciences and LDL in electrical and computer engineering. 1.4 Problem-type: reconstruction vs identification RA includes reconstruction and identification (Klir, 1985). In reconstruction, ABC (Figure 1) is the data, one goes down the lattice until decomposition losses are unacceptable. Or, one can start at the bottom with A:B:C (the independence model for neutral systems) or AB:C (the independence model for a directed

system with output C), and ascend until model error relative to the data is too great or the model is unacceptably complex. Descending the lattice is especially natural for neutral systems, while ascending the lattice is more natural for directed systems. Thus, in reconstruction, a distribution or relation is decomposed (compressed, simplified) into projected distributions (also called “margins”) or relations. ABC might, for example, be decomposed into AB and BC, written as structure, AB:BC. Taken together, the two linked bivariate projections would constitute a model of the data which is less complex (has fewer degrees of freedom) than the data. By maximum-entropy (uncertainty) composition of these projections, the model yields a calculated trivariate ABCAB:BC distribution or relation which may differ from the observed ABC data. The difference (error) represents loss of information in the model. By definition, the data itself have 100 percent information (0 percent error). Models are also assessed in complexity, where complexity for IRA is degrees of freedom (df), the number of parameters needed to specify a model. For convenience, df values may be normalized so that the data are 100 percent complex and the independence model 0 percent complex. Reconstruction decomposes data by finding less complex models which preserve either all of its information (lossless decomposition) or a sufficient amount of its information (lossy decomposition), where sufficiency is assessed either statistically or by other standards. Reconstruction is done in either a confirmatory or exploratory mode. In the confirmatory mode for IRA, a specific model or a small number of models, proposed a priori on the basis of theoretical considerations, are tested statistically. In the exploratory mode, one has no prior idea about what model might be suitable, and one examines many structures to find a best model or a family of best models. LL modeling is normally done in the confirmatory mode and, indeed, in the social sciences, exploratory modeling is normally frowned upon. The situation is quite different in machine learning, a field explicitly devoted to exploratory modeling. Identification is pure composition. For example, the observed data might be the two distributions AB and BC. Because they are not derived from a single ABC, AB and BC can be inconsistent if they disagree in their common B projection. If such an inconsistency can be resolved, a calculated ABC can be generated. Identification methods exist which resolve such inconsistencies and make possible the integration of multiple data sets (as in a database merge) coming from different sources. The LL and LDL literatures have not articulated the identification problem and are focused exclusively on reconstruction. Reconstruction in the exploratory mode is the typical RA problem. For convenience, call modeling with “few” variables data analysis and modeling with “many” variables data mining. This paper discusses both, but is motivated by the task of data mining, for which adequate RA software is not

An overview

881

K 33,5/6

882

yet available. (LDL techniques have been used for data mining, but LL methods, not typically implemented for many variables or for exploratory searching, are rarely mentioned in the data mining literature.) “Few” variables here means that exhaustive evaluation of all possible RA models can be done. This allows us to be certain of the choice of a best single model or it might be done for a very different purpose, namely to characterize the data by the set of errors for all possible decompositions. The limit of exhaustive search is roughly about seven or eight variables, in round numbers, ten. Data mining here means exploratory modeling beyond this threshold, i.e. with 10s, 100s, perhaps even 1,000s of variables. Exhaustive search is then no longer possible, and heuristic techniques, which consider only a subset of possible models, must be used instead. This threshold of somewhat less than ten variables marks the limit of exhaustive analysis of all models, but there is a second threshold involving the number of possible states of the system (as opposed to the number of states observed in the data), which poses limits even for heuristic search. This second threshold presently precludes the use of multi-predictive-component RA models for more than about 20 variables, but single-predictive-component models can still be used. For single-predicting-component (SPC) models, one can treat 10s, 100s, and perhaps even 1,000s of variables, with computing time and space requirements depending on the size of the data, not the size of the state space. However, multi-predicting-component models for directed systems are cyclic (have “loops”), and at present cyclic models in IRA need computation on the entire state space, without regard to how sparse the data are. For 20 binary variables, the state space is about 24 MB. Adding many variables beyond 20 is impractical, though approximate RA computation, which varies with the data and not the state space, might extend this range. Two possible approximation approaches are mentioned at the end of this paper. 1.5 Method-type: variable-based, state-based, and latent variable-based RA Reconstruction as explained earlier illustrates variable-based modeling (VBM), which decomposes data into subsets of variables. This is the most common situation. Two other method-types are available: state-based modeling (SBM) and latent variable-based modeling (LVBM). SBM is less developed than LVBM. Originally an aspect of the “k-systems analysis” of Jones (1985a, b, 1986), SBM is now being more integrated into the standard RA framework (Johnson and Zwick, 2000; Zwick and Johnson, 2004). VBM reveals information-rich sets of variables, and a variable-based model is a set of complete projections. By contrast, SBM selects information-rich states, i.e. salient conditions (Shaffer, 1988; Shaffer and Cahoon, 1987). A state-based model is a set of frequency values selected from the original data and its projections, but complete projections do not have to be included. For example, for f ðA; B; CÞ data, the SB model could be the frequencies, f ðA2 ; B1 ; C 3 Þ;

f ðA1 ; B2 Þ; and f ðC 2 Þ: SBM resembles rule-based methods in logic programming and fuzzy control. It also resembles Crutchfield’s 1-machines (Feldman and Crutchfield, 1998). Even though LL methods overlap with IRA, there is nothing equivalent to SBM in the LL literature. Something like SBM seems standard in LDL, where decompositions involving sums of products having varying numbers of variables are widely used. In latent variable models, complexity is reduced or new constructs are introduced by adding additional unmeasured, variables. For example, an AB distribution might be modeled by the simpler AQ and QB projections of an AQ:QB model. LV models are absent in the RA literature but widely used in the LL field (Hagenaars, 1993; Vermunt, 1997). However, latent variable LL software which is usable for exploratory data mining is not available. In LDL, latent variables are standard, functions being typically decomposed using both “free” (observed) and “bound” (latent) variables (Grygiel, 2000; Grygiel et al., 2004). Since the objective of this paper is to explain RA (especially IRA) methodology, no survey is offered of RA applications, and comparisons with other methods are not undertaken. Section 2 provides details mostly on IRA reconstruction. Section 3 gives a few applications of RA to data analysis and mining. Section 4 is a summary and discussion.

An overview

883

2. More detailed explanation The “prototypical” RA analysis is IRA variable-based reconstruction, which will now be explained. Brief explanations will be provided for SRA, identification, and LVBM and SBM. 2.1 Information-theoretic variable-based reconstruction (and identification) 2.1.1 Basic reconstruction steps. Figure 2 shows variable-based information-theoretic reconstruction. The data ABC, is shown on the left. The best RA model, AB:BC, judged by its information and complexity, is

Figure 2. Example of IRA variable-based reconstruction of a neutral system (the number of shaded cells is the degrees of freedom of each model)

K 33,5/6

884

shown on the lower right; its calculated frequencies are on the upper right; Reconstruction is done in three steps: (1) projection; (2) composition; and (3) evaluation. Projection. The ABC data is projected into the two contingency tables, AB and BC which define the mode AB:BC (lower right). This step is straightforward, and is given by f ðA0 ; B1 Þ ¼ f ðA0 ; B1 ; C 0 Þ þ f ðA0 ; B1 ; C 1 Þ: Composition. These two tables together yield the calculated ABCAB:BC table (upper right), where frequencies are rounded to the nearest integer. Shift from frequencies to probabilities (frequencies divided by the sample size). Let p and q denote observed and calculated probabilities, respectively. The IRA composition step is a “maximum entropy” procedure (Good, 1963; Miller and Madow, 1954), where entropy is Shannon entropy, referred to here as uncertainty. For model AB:BC, the calculated probability distribution, qAB:BC ðA; B; CÞ; is distribution which maximizes the uncertainty, XXX UðAB : BCÞ ¼ 2 qAB:BC ðA; B; CÞ log qAB:BC ðA; B; CÞ subject to the AB and BC projections of the data, i.e. to linear constraints. qAB:BC ðA; BÞ ¼ pðA; BÞ and qAB:BC ðB; CÞ ¼ pðB; CÞ: Since AB:BC has no cycles, the solution can be written algebraically: qAB:BC ðA; B; CÞ ¼ pðA; BÞ pðB; CÞ=pðBÞ: Calculated distributions for cyclic structures, like AB : BC : AC, need to be evaluated iteratively by the iterative proportional fitting (IPF) algorithm. As shown in Table II calculation of qAB:BC:AC ðA; B; CÞ starts with a uniform distribution. IPF then imposes upon it iteratively the observed projections specified by the model. At iteration no. 1, first AB is imposed, then BC is imposed, and then AC is imposed. At iteration no. 2 AB is reimposed because agreement with AB was destroyed when the other projections were imposed at The algorithm starts with qð0Þ AB:BC:AC ðA; B; CÞ ¼ 1=ðjAj jBj jCjÞ; where jj means cardinality and then loops over iterations from j ¼ 0 until convergence:

Table II. Iterative proportional fitting for model AB : BC : AC

For all A, B

ð3jþ1Þ ð3jþ0Þ ðA; B; CÞ ¼ qAB:BC:AC ðA; B; CÞ qAB:BC:AC

pðA; BÞ=qð3jþ0Þ AB:BC:AC ðA; BÞ

For all B, C

ð3jþ2Þ ð3jþ1Þ qAB:BC:AC ðA; B; CÞ ¼ qAB:BC:AC ðA; B; CÞ

ð3jþ1Þ pðB; CÞ=qAB:BC:AC ðB; CÞ

For all A, C j ¼ j+1

ð3jþ3Þ ð3jþ2Þ qAB:BC:AC ðA; B; CÞ ¼ qAB:BC:AC ðA; B; CÞ

ð3jþ2Þ pðA; CÞ=qAB:BC:AC ðA; CÞ

the previous iteration. Then BC is reimposed, followed by AC, and so on. IPF iterates until qAB:BC:AC ðA; B; CÞ converges. The IPF algorithm requires that the entire distribution be computed, not merely the set of observed states, because as each projection is imposed on the working qAB:BC:AC ðA; B; CÞ; nonobserved states will in general contribute to other calculated projections. Thus, computer time and space requirements for cyclic models vary with the state space of the problem, and not with the sample size. In LL modeling, calculations often go beyond the generation of qmodel. The individual frequencies of specific states, e.g. f AB:BC ðA0 ; B1 ; C 0 Þ; can be decomposed into the contribution from all the separate “effects,” i.e. A, B, C, AB, and BC. This can be useful if one is particularly interested in one or a few states. Such decomposition is not prominent (perhaps even absent) in the RA literature. Evaluation. The calculated ABCAB:BC is compared to the observed ABC. The calculated distribution approximates the data, which are always more constrained. The error is the constraint lost in the model (Figure 3), called transmission T(AB : BC) which is the difference between the uncertainty of the model and the uncertainty of the data. XXX TðAB : BCÞ ¼ 2 pðA; B; CÞ log½ pðA; B; CÞ=qAB:BC ðA; B; CÞ

An overview

885

¼ U ðAB : BCÞ 2 UðABCÞ Information, i.e. constraint captured in the model, is normalized to [0,1] with respect to the independence model, A:B:C (or to AB:C, for directed systems where C is the output), as follows: Information ¼ 1 2 ½TðAB : BCÞ=TðA : B : CÞ In addition to model error, model complexity is of interest. It is desirable to minimize both, but there is a tradeoff between the two. Decisions on model acceptance are made either by optimizing one subject to the other as a constraint, or by merging the two via chi-square or other approaches.

Figure 3. Constraint lost and retained in models

K 33,5/6

886

Complexity of the model is defined as its df, the number of parameters needed to specify it. Reconstruction is compression; it reduces complexity. For the data and model, df is shown in Figure 2 by the count of shaded cells. Knowing the sample size, subtract 1 from the number of cells (states) in a table; thus for data ABC (the “saturated model”), df ¼ 7: In AB : BC, dfðABÞ ¼ 3 but once AB is specified only two more numbers are needed to specify BC, because the B margins of the tables must agree. Algebraically, dfðAB : BCÞ ¼ dfðABÞ þ dfðBCÞ 2 dfðBÞ ¼ 3 þ 3 2 1 ¼ 5: Normalizing df to the [0, 1] range gives a normalized complexity, ComplexityðAB : BCÞ ¼ ½dfðAB : BCÞ 2 dfðA : B : CÞ= ½dfðABCÞ 2 dfðA : B : CÞ: In confirmatory analysis, AB : BC might be a hypothesized model. Its error would be assessed by calculating the likelihood-ratio chi-square, L 2 ðAB : BCÞ ¼ 1:3863 N T(AB : BC), where N is sample size. L 2(AB : BC) and df(AB : BC) are then used to obtain a, the probability of making a Type-I error by rejecting the null hypothesis that the calculated ABCAB : BC is statistically the same as the observed ABC. There are also other ways of integrating model error (or information) and complexity to decide on model acceptability. 2.1.2 Exhaustive analysis. Figure 2 shows confirmatory RA, where one model is assessed. The exhaustive evaluation of all models for these data is given in Table III which gives (information, a, df) for every model. This complete RA characterizes the data more fully than merely stating that the best model is AB : BC. ABC has 100 percent information. The probability of error in rejecting its agreement with the data is 1, since it is the data. A : B : C is the baseline for analysis, and thus has no information. The probability of error in rejecting its agreement with the data in this case is 0. A good model has high information and low df. If models are compared to the IT data (as is done in Table III), a good model has high a, because the probability of error in rejecting the equivalence of model and data should be high. The more familiar preference for low a holds if the model is compared not to ABC but to A : B : C.

Table III. IRA results for the data of Figure 2 (information, Note: The referene model for calculation of a is ABC. shading shows models to be considered if a, df) for ABCmodel the system were directed, with C being the dependent variable.

If Figure 2 data were for a directed system, with output C and inputs A and B, only the shaded models in Table III need be considered. For directed systems it is useful to state results in terms of reductions in the output uncertainty, knowing the inputs, rather than in terms of information. If the uncertainty reduction, DUðCjBÞ ; U ðCÞ 2 U ðCjBÞ ¼ TðB : CÞ ¼ U ðCÞ 2 U ðB; CÞ þ U ðBÞ is positive and statistically significant, model AB : BC is acceptable, i.e B is a predictor of C. Because Shannon entropy involves a log term, even small DU may indicate high predictability. For example, if the odds of rain vs no rain is 2 : 1 in winter and 1 : 2 in summer and 1 : 1 over the year (for equal seasons), then knowing the season makes a big change in the odds, but reduces U(rain vs no rain) by only 8 percent. For the Figure 2 data, the results are shown in Table IV. For these data, A and B reduce the uncertainty of C by only very small amounts, less than 1 percent. The last column of the table also shows that quantitative RA measures (such as U or T) can be thought of equivalently as assessing models. Note that there are two ways that A and B can both predict C. In AB : AC : BC they do so separately, but in ABC there is an interaction effect (Zwick, 1996). Here a is computed relative to the independence model, not the saturated (top) model; it is small if results are statistically significant. The table shows that A is a better predictor of C than is B; and that DU(C) using both inputs, either in ABC or AB : AC : BC, is very small but still statistically significant at the 0.05 level. 2.1.3 Identification. Figure 2 can be used also to explain identification. If the data are not the ABC on the left but the two distributions on the lower right, identification derives the calculated ABCAB : BC on the upper right. This is trivial when AB and BC are derived by projection from a single ABC distribution. Suppose, however, that AB and BC came from different sources, and that the BC table had a B margin of [1,024, 454] while the AB table still had the B margin of [1,034, 444]. AB and BC would be inconsistent in their common B margin; this could arise from sampling or other errors. When data sets are obtained from different sources inconsistency is the norm, and resolution of inconsistency is required before composition can be done ( Klir, 1985).

Percent DU U(CjA, B) U(CjA, B) U(CjA) U(CjB) U(C ) Notes: output C; inputs

0.56 0.52 0.49 0.00 – A and B.

Ddf

a

3 2 1 1 –

0.013 0.007 0.002 0.866 1.000

An overview

887

Associated model ABC AB:AC:BC AB:AC AB:BC AB:C

Table IV. Uncertainty reductions

K 33,5/6

888

Anderson (1996), Mariano (1984, 1987) and Pittarelli (1990) have shown how inconsistencies among distributed data sets may be resolved. Anderson also showed how “nuisance” variables can be exploited in the composition of disparate data sets. To find a relation between A and C from inconsistent AB and BC distributions, the inconsistencies are first resolved, after which the adjusted AB0 and BC0 distributions are composed into ABC and projected onto AC. 2.2 Information-theoretic state-based and latent variable-based reconstruction VBM requires the complete specification of its components (e.g. the full AB and BC tables in Figure 2). It is possible, alternatively, to define a model in terms of the probabilities of any set of non-redundant states selected from ABC and its projections ( Jones, 1985a, b). Thus SBM encompasses VBM as a special case. SBM selects states with unexpectedly high or low probabilities (relative to some prior); these are the facts prominent in the data, and qmodel ðA; B; CÞ has maximum uncertainty, constrained by these values. This approach is more powerful than VBM, and Chen (1997) has discussed its possible use for data mining. The penalty is that there are many more SB models, so SBM might be most useful for data mining if it were used as a follow-on to VBM. The essence of SBM is illustrated in Table V. The AB model has df ¼ 3; shown by the shading of three cells (arbitrarily chosen). The A : B model has df ¼ 2; i.e. it needs two specified probability values, one (arbitrarily chosen and shaded) from each margin. A : B, with probabilities p(A) p(B), is not identical to AB and exhibits error. A state-based model specifying the single probability value, pðA1 ; B1 Þ ¼ 0:7 (shown shaded) forces the remaining ðA0 ; B0 Þ; ðA0 ; B1 Þ; and ðA1 ; B0 Þ probabilities, by the maximum uncertainty principle, to be 0.1. These values are correct and this model has zero error even though it is simpler than A : B. This example was constructed to show that a one-parameter SB model can be superior to a two-parameter VB model. SB analysis of the distribution of Figure 2 reveals that a four-parameter model, pðA1 ; B0 Þ pðA0 ; B1 Þ; pðB0 ; C 1 Þ; and pðB1 ; C 0 Þ captures virtually the same amount of information as the five

Table V. State-based decomposition

parameter AB : BC model (note that the four states come from AB and BC). In more complex distributions, the economy of SBM is more dramatic. LVBM is not developed within RA, though it is in the LL literature. It is also widely used for set-theoretic mappings in the LDL literature. It will be discussed here only briefly. To illustrate latent variable reconstruction, consider BA : BC, where B is an input and A and C are outputs. By attributing directionality to the relations in the structure, we model A ˆ B ! C: If variables were quantitative instead of nominal and relations were linear, this would be “factor analysis” (Kim and Muller, 1978) and B would be a common factor, which explains (away) the relation between A and C. B may be a new construct, implicit in AC, which LVBM makes explicit. If B mediates between A and C, this is “path analysis” (Davis, 1985). The structure might be written as AB : BC to suggest the causal sequence A ! B ! C: As factor and path analyses are for quantitative variables “latent class analysis” is for nominal variables (McCutcheon, 1987). In latent class analysis, a distribution AC might be explained by invoking a latent variable B and a calculated distribution ABC which is decomposable into BA : BC (equivalently, AB : BC). The generation of the calculated ABC is not as straightforward as it is in VBM, and a different algorithm is used in place of IPF (Hagenaars, 1993). IRA or LL modeling with latent variables thus offers a nominal variable generalization of path analysis, factor analysis, and covariance structure modeling (Long, 1983), which are restricted in application only to linear relations among quantitative variables. Latent variables can also be used for compression. If given AC, a latent variable B yields a calculated ABC with structure BA : BC, and if jAj ¼ jCj ¼ 4 and jBj ¼ 2; then dfðACÞ ¼ 15; while dfðBA : BCÞ ¼ 13; so BA : BC is less complex than AC.

An overview

889

2.3 Set-theoretic reconstruction In SRA, data are a set-theoretic relation – a subset of a Cartesian product – the set of states actually observed, without regard to frequency. Set-theoretic reconstruction is similar to information-theoretic reconstruction, as can be seen by comparing Figure 2 to Figure 4. In Figure 4, the two values of A, B, and C are indicated as 0 and 1, but these are only arbitrary labels.

Figure 4. SRA variable-based reconstruction (neutral system)

K 33,5/6

890

As in IRA, ABCAB : BC is the maximum uncertainty relation, given the model constraints, i.e. the model-specified projections. U(ABCAB : BC) is now the Hartley (rather than the Shannon) entropy, U ðRÞ ¼ log2 jRj; where jj is cardinality. This maximization is achieved in composition by: (1) expanding the projections by Cartesian products with variables omitted in them (maximizing uncertainty); and (2) taking the intersection of the expanded relations (imposing the model constraints). Specifically, ABCAB : BC ¼ ðAB ^ CÞ > ðBC ^ AÞ; namely; ABCAB : BC ¼ ½{00†; 01†; 11†}^{††0; ††1} > ½{†00; †10; †11}^{0††; 1††} ¼ {00* ; 01* ; 11* } > {* 00; * 10; * 11} ¼ {000; 010; 011; 110; 111} ¼ ABC († is a place-holder for absent variable(s); * means “do not care”). The second line of this equation shows that SRA resembles a product of sums of products. Generalizing the above equation, for model P 1 : P 2 . . . P n ; where Pj is a projection of relation R (the data), and Mj is the Cartesian product of variables absent ( projected on) in Pj, the calculated reconstructed R is given by RP 1 :P 2 : . . .Pn ¼ ðP 1 ^M 1 Þ > ðP 2 ^M 2 Þ > . . . > ðP n ^M n Þ Relations for cyclic models are calculated in the same way; no iterative procedure is needed. In the present case, ABCAB : BC has no error. Imperfect decomposition can, however, be allowed, and again transmission (error) is defined as TðAB : BCÞ ¼ U ðABCAB : BC Þ 2 UðABCÞ: No model simpler than AB : BC agrees fully with the data, as shown in Table VI. Note that SRA decomposition of ABC here uses complete projections and only observed variables. No latent (bound) variables are introduced as is commonly done in LDL techniques. The above relation could have been decomposed much more simply as {000; * 1* } or logically as ðA0 ^B0 ^C 0 Þ W B; where prime means negation. Such a decomposition might be considered a

ABC (5) AB : AC : BC (5) AB : AC (6) AB : BC (5) AB : C (6) AC : B (8) Table VI. A : B : C (8) SRA results for the data Note: Number of tuples in model are in parentheses. of Figure 4

BC : AC (6) BC : A (6)

“state based RA model”, in that specific states of subsets of variables are selected, were it not for the fact that in SRA SB models, states are always connected with an ^, as indicated in the intersect operations in the equation for RP 1 :P 2 : . . .Pn : The precise relationship between SRA and LDL methods has not yet been fully elucidated. Of particular interest would be a comparison of SRA to LDL multi-valued functional decomposition (Files and Perkowski, 1998; Perkowski et al., 1997). An initial examination of an enhanced version of SRA shows that it is superior for three-variable binary functions to Ashenhurst-Curtis decomposition, a well-known and used LDL technique which utilizes latent variables (Al-Rabadi and Zwick, 2004; Al-Rabadi et al., 2004). IRA problems can be approximated by SRA. In Figure 2, if frequency is discretized into ranges (Chen, 1994; Grygiel, 2000; Grygiel et al., 2004), it becomes a nominal variable and the mapping A ^ B ^ C ! F can be decomposed by SRA. In doing so, the different frequency bins become nominal states and their order cannot be exploited. This also precludes the assessment of statistical significance. Discretization of frequencies is qualitatively different from discretization of variables, and the error consequences of this approach, in comparison to IRA/LL statistical analyses, are under investigation. Conversely, SRA problems can be treated by IRA by giving equal probability to observed tuples; IRA then generates results equivalent to SRA but in a different form. Just as a Fourier transform of a function may illuminate particular properties of the function previously obscure, even though the transform is fundamentally equivalent to the function, so IRA computation may be informative in ways not directly achievable by set-theoretic analysis (Zwick and Shu, 2004).

An overview

891

2.4 Evaluating many models In Figure 1, AB : AC, AB : BC, and AC : BC permute the three variables, as do AB : C, AC : B, and BC : A. There are five different general structures and nine specific structures, where specific structures exemplifying the same general structure merely permute the variables. (Note: this nomenclature is not standard in RA.) For example, AB : BC and AC : CB are the same general structures. Structures are disjoint (AB : C), acyclic (AB : BC), or cyclic (AB : AC : BC). A specific (acyclic) structure is exemplified in Figure 5 along with the general structure obtained from it by omitting the labels for specific variables (lines) and relations (boxes). Figure 5. Specific structure AB : BC and general structure

K 33,5/6

892

Table VII. Number of structures

For four variables, there are 20 general structures and 114 specific structures. As the number of variables increases, the numbers of general and specific structures increase sharply (Table VII). The table indicates that directed system lattices are simpler than neutral system lattices because we are not interested in the presence or absence of relations among the inputs. (Recall the shaded portions of Figure 1 and Table II.) Table VII also shows that exhaustive evaluation of all models ceases to be practical around seven or eight variables. Intelligent heuristics and sophisticated search techniques are required to sample the Lattice of Structures efficiently, and Conant (1998a, b), Klir (1985), Krippendorff (1986), and others have made suggestions along these lines. The lattice can be pruned as a search procedure descends or ascends so that consideration is restricted only to promising candidates. Or, the search can be done roughly between groups of structures and then finely within these groups (Klir, 1985). The combinatorial explosion in models can also be mitigated by aggregating variables, but this poses difficulties of interpretation. It also does not reduce the number of possible states (the size of the contingency table) for the system, unless state aggregation, another compression technique, also accompanies variable aggregation. Figure 6 shows the lattice of general structures for a four-variable system. If variables are all dichotomous, the degrees of freedom of the structures range from 15 at the top (ABCD) to 4 at the bottom (A : B : C : D) and decrease by 1 at every level. The figure also shows the acyclic structures, indicated with boxes in bold (10 of the 20). The simplified lattice which is obtained if D is an output and A,B,C are inputs has nine structures whose bottom is ABC : D; these are indicated by structures where one line (representing D) is bold. In this simplified lattice, four of the structures are also acyclic. These correspond to SPC models in which the predicting component has 0, 1, 2, or 3 predictors of D. Model complexity for IRA was defined above as the number of parameters needed to specify the model. The figure suggests another type of complexity, namely the number of components in the model. For three variables, this is at most three, for four variables at most six; in the medical sociology example discussed below it reaches 12. Multi-component models pose a challenge to interpretation. Beyond the mere number of components, there is also the complexity inherent in the connectedness of the components.

No. of variables 3 No. of general structures 5 No. of specific structures 9 No. of specific structures, one output 5 Note: The bottom line is for directed systems.

4 20 114 19

5 180 6,894 167

6 16, 143 7, 785, 062 7, 580

An overview

893

Figure 6. Lattice of general structures (four-variables). A box is a relation; a line, with branches, uninterrupted by a box, is a variable. Arrows indicate decomposition. The top structure is ABCD; the bottom is A : B : C : D. The ten acyclic structures have relations in bold. For one output, D, and three inputs, D is in bold for the nine structures in the simplified lattice

3. Examples of variable-based reconstruction This section illustrates the exhaustive consideration of models in data analysis, and the heuristic use of RA in many variable data mining. To reiterate, “many” variables means that all possible models cannot be examined by brute force. The examples are restricted to information-theoretic variable-based reconstruction. 3.1 IRA example 3.1.1 Exhaustive analysis ( four variables). Figure 7 shows an exhaustive analysis of a small subset of the OPUS data obtained from Dr Clyde Pope of the

K 33,5/6

894

Kaiser Permanente Center for Health Research in Portland, Oregon (2003). These data concern health care utilization in the Kaiser member population. The figure summarizes the relationships between four variables in the data set, plotting the (complexity, information) for all 114 specific structures. The best models in this graph are its “northeastern frontier” where models are not dominated by (inferior in both complexity and information to) any other model. From this “solution set,” one might pick out the simplest model having information greater than some minimum, the most information-rich model having complexity less than some maximum, or some other single model. 3.2 Heuristic search (24 variables) 3.2.1 Basic results. From the OPUS data (subset) with sample size of 2,100, 24 possible predictors (inputs) of self-reported health status (output) were selected. The inputs include sex, age, family and work information, socioeconomic, behavioral, and psychological measures, health-related attitudes and activities, and other variables. Quantitative input variables were discretized, and some nominal variables were also rebinned. The cardinalities of the variables as analyzed range from 2 to 6 and the product of the 25 cardinalities was 1:16 £ 1013 : After analysis the best model, involving eight variables (seven inputs, BHLNORU, and one output, X), with a state space of 32,400 states, explains 65.4 percent of U(X). This model is complex, having 11 predictive components, each of which represents a high ordinality interaction effect between four or five inputs and the output. The model, stated here without identifying the specific nature of the input variables, is: BHLNORU : BHNRX : LORUX : LNOUX : HLOUX : HLNOR X : LNRUX : HLRUX : NORUX : HORUX : HNOUX : HNRUX: If this model had been proposed for confirmatory testing, its a, relative to a null hypothesis of no association, between the 24 inputs and 1 output, would be 0.03. Rigorously speaking, one cannot simply say that the statistical

Figure 7. Decomposition spectrum. Complexity, information for 114 four-variable structures; date from Kaiser Permanente Center for Health Research

significance of this model is 0.03 because the statistical significance of the results of an exploratory search depends upon how many models are looked at. However, reporting the significance which would have been obtained had this model been subjected to a confirmatory test can motivate a follow-on study which explicitly tests the model. U(X) may be capable of being even further reduced but if additional work is undertaken on this data set, simplifying the model to facilitate interpretation would be more critical. 3.2.2 Search strategy. The strategy used to obtain this model was a two-step process (Figure 8). In the first step, the 24 inputs were reduced to 12 and then to 7. This is “feature selection”. The subset of seven inputs was selected by examining the uncertainty reductions achieved in SPC models. For seven inputs, the best model is BHLNORUX The uncertainty reduction of X from the seven inputs in this model is 81.7 percent, but this reduction is not significant ða ¼ 1:0Þ: The seven-input model was chosen even though it was not significant because of the expectation that by decomposition the model could be simplified without major loss of predictive power. This was indeed possible. Step two yielded an 11-predicting-component model, reducing the uncertainty of the output by 65.4 percent, a reduction which is statistically significant ða ¼ 0:03Þ: The model utilizes all seven inputs, but in components each having four or five inputs. This decomposition preserved 80 percent (65.4/81.7) of the predictive information in the SPC model ða ¼ 1:0Þ obtained in the first step, BHLNORUX. This second step, unlike the first step, does require operations on the full state space, which here has cardinality 32,400. Only seven rather than a greater number of inputs were used because of the combinatorial limits of the program, which examined all SPC models by brute force. (The program generated all models with one, two, three, etc. predicting inputs.) However, in more recent versions of the program, this brute force approach has been replaced by a breadth-first fixed width search, which allows the rapid examination of an arbitrary number of inputs. Also a related RA method exists, known as “extended dependency analysis” ( EDA), which allows

An overview

895

Figure 8. Search Strategy using 2-step progress

K 33,5/6

896

heuristic search in single-predicting component models with 10s or 100s of variables (Conant, 1988a, b; Lendaris et al., 1999; Shannon and Zwick, 2004). EDA merges the several good components to produce a single-component model, without evaluating all models with this number of total inputs. Recently, in an RA study looking only at SPC models, 150 variables were reduced to 46 for subsequent use by neural net modeling (Chambless and Scarborough, 2001). To repeat a point made earlier: these calculations, involving SPC models, do not require operations on a complete state space and scale with the size of the data, which for the subset of OPUS data we used is quite modest, namely N ¼ 2; 100: 3.2.3 Model utilization. The 11-component model indicated above can be used in at least three ways. First, each predicting component, e.g. BHNRX or LORUX, represents a high ordinality interaction effect involving several inputs and the output (health status, X ). If the inputs were originally quantitative variables, these effects are also likely to be nonlinear. Ideally, each of these components should be given a substantive interpretation of how the input variables combine to affect X. This requires extensive subject-specific knowledge, and has not yet been attempted for this study. Second, it is possible to assess the multiple components relative to one another, to identify which are the most important. For the seven inputs used, this 11-component model was maximally predictive (under the requirement that the model would have been statistically significant if it had been proposed for confirmatory test), but a simpler though less predictive model might be preferred. Further analysis of the model might allow such simplification. Third, the 11 component model yields a calculated BHLNORUX distribution, which can be converted to pðXjB; H ; L; N; O; R; U Þ; a distribution on X conditioned on the input variables. Thus, given the values of all the seven input variables, RA yields predicted probabilities for all the different possible output states. These can be compared to p(X), the probabilities of the output without any knowledge of the inputs. Finally, it should be reiterated that exploratory searches are precisely exploratory. Resulting models should ideally be tested in a confirmatory mode with new data. The selected inputs and/or the specific multi-component model can be used in conjunction with methods which retain the quantitative character of variables which have been discretized. With SPC RA modeling, feature selection can be done from a large set of variables, as noted above, and the selected features fed into a neural net (NN). The NN part of such a RA-NN strategy may sometimes net be necessary. In the pattern-recognition study of Lendaris et al. (1999), RA on discretized variables essentially solved the problem by itself. Multi-predicting-component RA modeling can also be used to prestructure neural nets, so that all-to-all connectivity is not required (Lendaris, 1995; Lendaris and Mathia, 1994; Lendaris et al., 1993, 1997); this speeds NN training

times and improves generalization capacity. RA may also have promise for the prestructuring of genetic algorithms, more specifically for determining the optimal order of the variables on the GA genome (Zwick and Shervais, 2004). 3.3 SRA example SRA reconstruction is exemplified by a study which attempted to predict whether discrete dynamic systems – specifically, elementary cellular automata (ECA) – are chaotic or not. An ECA is a one-dimensional array of cells, s(1) . . . s(n), governed by a mapping, st ði 2 1Þ ^ st ðiÞ ^ st ði þ 1Þ ! stþ1 ðiÞ; which specifies how the state of each cell at time t þ 1 depends upon its state and the state of its two adjacent neighbors at time t. The three cells at time t and the center cell at time t þ 1 will be labeled A, B, C, and D. This is a deterministic directed system which illustrates the use of RA for time series analysis (see also Zwick et al., 1996 for an IRA time series example). An example of an ECA mapping or “rule” is given in Table VIII. There are 256 mappings which preserve the identity of the three inputs, but in the ECA context these group into 88 equivalence classes, analyzed by SRA. Every mapping can also be converted in probability distribution by setting pðA; B; C; DÞ to 1/8 if an (A, B, C, D) tuple appears in the mapping and 0 if it does not. The resulting probability distributions were also analyzed by IRA. Table IX illustrates how a rule governs ECA dynamics. Eight cells are arranged in a toroid, so sð9Þ ¼ sð1Þ: For every cell, the rule produces its next state, stþ1 ðiÞ; given its present state, st(i), and the present state of its left and right neighbors, st ði 2 1Þ and st ði þ 1Þ: Eventually the system reaches either a fixed point or limit cycle attractor. Such discrete dynamics can be considered “chaotic” if the time to reach the attractor goes up rapidly with the number of cells. Assignments of chaoticity or nonchaoticity were taken from Li and Packard (1990).

s(i 2 1) A 0 0 0 0 1 1 1 1

t s(i) B

s(i+1) C

t+1 s(i) D

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

0 1 1 0 1 0 0 1

Note: The rule is indexed by considering the D column as a binary number, whose top-most value is its least significant bit.

An overview

897

Table VIII. An example of an ECA rule (#150)

K 33,5/6

898

Table IX. Three time steps for ECA #150

Two “standard” parameters used to predict chaoticity were employed for comparison purposes: l (Langton, 1992) and Z (Wuensche, 1992). The parameter l was actually first proposed by Walker and Ashby (1966), who called it “homogeneity”. SRA or IRA decomposition properties of the rules were also used (Zwick and Shu, 1997, 2004) to predict chaoticity or nonchaoticity. It can be shown that the specific structures for this three-input, one-output problem can be reduced to the 12 specific structures shown in Table X. These group into six levels of complexity, indexed by parameter, s. Structures whose variables are permuted have the same complexity, but for ECAs not all permutations are equivalent, because the neighbors A and C are different from B, so there are nine general structural types (shown in bold). Table X indicates that while all single-predicting component models are themselves mappings, the s ¼ 4 and 5 models decompose the rule mapping into (stochastic) relations whose intersection yields the correct mapping. Such an approach to decomposition is quite different than what is encountered in LDL decomposition. By doing SRA on each rule, a s is assigned to the rule, which indicates how decomposable without loss the rule is. Also, a vector parameter t, characterizes each rule by the fill set of decomposition losses (transmissions) for all 12 possible specific structures. IRA decomposition was also done on the rules, and

Note: The enclosed four cells illustrate the mapping A^B^C ! D for one time step. The array at time t is arbitrary.

Structures

s 6

Table X. Structure for ECA rules

SRA

Mapping Three relations 5 ABC : ABD : ACD : BCD ( ! mapping) Two relations 4 ABC : ABD : ACD ABC : ABD : BCD ABC : ACD : BCD ( ! mapping) 3 ABC : ABC ABC : ACD ABC : ACD Mapping 2 ABC : AD ABC : BD ABC : CD Mapping 1 ABC : D Constant Note: The s identifier for the six structural levels is given, and the nine different specific structures are shown in bold. ABCD

two measures, f 00 and f 0 , were calculated, which are closely related to the fluency measure of Walker and Ashby (1966). These measures involve the transmissions for two of the three models (shown italicized in Table X) at level s ¼ 3: The first measure f 0 , is a vector measure which preserves information about the separate losses in the models; the second, f 00 , is a scalar measure which sums the decomposition losses of these two models.

An overview

899

f 0 ¼ {TðABC : BCDÞ; TðABC : ABDÞ} f 00 ¼ {TðABC : BCDÞ; þTðABC : ABDÞ} Table XI indicates the predictability of chaoticity or nonchaoticity of ECA dynamics using RA measures (s, f 0 , f 00 , and t) as compared to using standard ECA parameters (l and Z). The table shows that RA measures predict better than standard parameters. Predictability is assessed information-theoretically as uncertainty reduction in attractor variable, a, which has two states {chaotic, nonchaotic}. For rule parameter, r, which is either an RA measure or a standard parameter, the table lists DUa and DUr, which are large for good predictors, where DU a ¼ fractional reduction of U ðaÞ knowing r ¼ ½UðaÞ 2 U ðajrÞ=U ðaÞ DU r ¼ reduction of U ðaÞ per bit of predictor ¼ ½U ðaÞ 2 U ðajrÞ=U ðrÞ Not only do RA measures predict chaoticity or nonchaoticity better than the standard parameters of l and Z, the RA framework actually subsumes these standard measures in t, the complete vector of RA losses. Specifically, U ðljtÞ ¼ U ðZ jtÞ ¼ 0; i.e. t specifies also l and Z. In fact, l turns out to be isomorphic with U(D). No SRA measure comparable to fluency was apparent, and this illustrates the point made earlier that IRA analysis can be useful even for set-theoretic functions and relations, because it presents the analytical results in a form

Standard ECA parameters Walker-Ashby, Langton Wunenshe RA measures Lossless complexity Info.-theor. fluency Second fluency measure Complete RA spectrum

r

U(ajr)



0.679

l Z s f0 f 00 t

DUa

DUr

0.600 0.458

11.6 32.6

0.044 0.114

0.553 0.355 0.447 0.263

18.6 47.7 34.2 61.3

0.069 0.124 0.151 0.102

Notes: Reduction of uncertainty of attractor and uncertainty reduction normalized by information of predictor, r. The best uncertainty reductions for criteria are shown in bold.

Table XI. Predicting cellular automata dynamics

K 33,5/6

900

different from SRA. Analysis using the complete loss vector t is, however, equivalent in SRA and IRA. 4. Software Computations were done using a software package being developed at Portland State University named OCCAM (for the principle of parsimny and as acronym for “Organizational Complexity Computation And Modeling”) OCCAM is intended eventually to include all data, problem and method types. Other RA software packages do exist, e.g. CONSTRUCT and SPECTRAL by Krippendorff (1981), SAPS by Uyttenhove (1984) and Cellier and Yandell (1987), GSPS by Elias (1988), Klir (1976), and coworkers, EDA by Conant (1988a, b), Jones’ k-systems analysis (Jones, 1989) and a recent program by Dobransky and Wierman (1995). However, no package fully encompasses RA as shown in Table I. Some programs are not easily used by researchers outside die systems field; others do not incorporate statistical tests. These existing packages are in limited use. OCCAM is the result of a software development program under the author’s direction in the Systems Science PhD Program at PSU which began in 1985. The first program, written by the author, did single-predicting component modeling. This was improved upon by Jamsheid Hosseini (Hosseini (1987) and Hosseini et al. (1986, 1991)) and then by Doug Anderson who also wrote a program for multi-predicting component modeling, and another program for inconsistency resolution for IRA identification (Anderson, 1996). Hui Shu wrote SRA reconstruction and structure lattice programs. For convenience, the whole set of these earliest RA programs will be called OCCAM0. In this period, Klaus Krippendorff generously provided to us his programs mentioned above and these assisted our research and informed our development efforts. We also utilized GSPS obtained from Elias (1988). Marcus Daniels combined many functionalities of the Hosseini and Anderson reconstruction programs and Shu’s lattice program by rewriting them and adding heuristic search in the multi-predicting component modeling, to produce OCCAM1. Stan Grygiel made innovations and improvements in the single-prediction-component calculations, in search heuristics, and general research usability; this produced OCCAM2. Calculations reported in this paper were done with OCCAM2 and occasionally with earlier separate programs mentioned above. A new program (OCCAM3) has now been written by Ken Willett, which is a more effective research and applications platform (Willett and Zwick, 2004; Zwick, 2004a) and can be accessed over the web. Willett is also specifically exploring heuristic search and approximate computation approaches. Michael Johnson is programming SBM for future incorporation into the package, integrating it theoretically into the RA framework, and exploring its implications for decision analysis ( Johnson and Zwick, 2000; Zwick and Johnson, 2004). Bjorn Chambless has written a stand-alone SPC

information-theoretic program which includes binning and aggregation preprocessing capacities. Binning for OCCAM is being developed by Michael Johnson and Steve Shervais. Tad Shannon has programmed an updated version of Conant’s EDA (Shannon and Zwick, 2004) and also a time-series preprocessing utility.

An overview

901 5. Discussion In the example discussed above of IRA heuristic search, the first step can in principle be extended to much larger problems without difficulty, because SPC models do not require operations on the full state space, but depend rather on the size of the data. The second step, however, does require such operations, which as noted earlier limits the number of variables which can be analyzed to of order 20. This limitation does not preclude the use of RA for data mining applications involving more than 20 variables, in that the first step can always be used to select a smaller subset for lattice searches. Still, it would clearly be desirable if the second step could be implemented for a greater number of variables. The barrier here is the current requirement for cyclic models of an IPF operation which operates on the entire state space. If models could be generated with a procedure that operated only on observed states and scaled with the data, larger problems could be addressed, since usually data are sparse. At present, it is not apparent how to assess multi-predicting-component models in an alternative way, but two approaches which use approximate assessment and do not require the full state space will now be briefly mentioned. The first involves the use of binary decision diagrams (Mishchenko, 2001). BDD make possible major economics of space and computing time by storing states not explicitly but implicitly in the paths of the diagram. So, while the size of the state space increases exponentially with the number of variables, the size of the graph does not, but the number of paths in the graph does, which allows the graph to represent the exponential dependence of the number of states on the number of variables. It is likely that to use BDD to analyze distributions the distribution frequencies will need to be binned, i.e. IRA problems have to be converted to SRA problems. If the resulting information loss is not severe, and if BDD can be applied to such SRA approximations, the size of the state space may become much less limiting. This approach is under investigation (Zwick and Mishchenko, 2004). A second idea is to employ methods used in three-dimensional image reconstruction (Zwick and Zeitler, 1973), and Fourier methods in particular. These methods allow the composition of multiple projections in a single step, regardless of cyclicity. They compute an approximation to IPF which may or may not suffice for practical purposes. Most critically, these methods scale with the data and not the state space. This approach is also under current investigation (Zwick, 2004b). The use of Fourier methods in RA would bring

K 33,5/6

902

RA into proximity to LDL methods which use wavelets, Walsh functions, Haar transforms, and similar global or local function-based decompositions. In summary, RA methods are general, being applicable to set-theoretic relations as well as probability distributions. Both SRA and IRA may be of interest to the machine learning and logic design community. SRA offers another approach to decomposition of relations and mappings, and IRA can be used for these purposes as well. The LDL community might profitably examine the use of LL latent variable modeling and state-based RA techniques. It might also consider extension of its techniques to distributions, where statistical considerations are necessary. The RA community and the social science LL community, on the other hand, can gain from a deeper familiarity with the LDL literature. RA methodology is potentially a valuable new approach to data mining. Current techniques can be applied to 10s or 100s of variables, and heuristic and approximate methods may substantially expand the range of RA modeling. References Al-Rabadi, A. and Zwick, M. (2004), “Modified reconstructability analysis for many-valued functions and relation”, Kybernetes, Vol. 33 No. 5-6, pp. 906-920. Al-Rabadi, A., Zwick, M. and Perkowski, M. (2004), “A comparison of modified reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions”, Kybernetes, Vol. 33 No. 5/6 pp. 933-947. Anderson, D.R. (1996), “The identification problem of reconstructability analysis: a general method for estimation and optimal resolution of local inconsistency”, Systems Science PhD dissertation, Portland State University, Portland, OR. Ashby, W.R. (1964), “Constraint analysis of many-dimensional relations”, General Systems Yearbook, Vol. 9, pp. 99-105. Bishop, Y., Feinberg, S. and Holland, P. (1978), Discrete Multivariate Analysis, MIT Press, Cambridge, MA. Cellier, F. and Yandell, D. (1987), “SAPS-II: a new implementation of the systems approach problem solver”, Int. J. General Systems, Vol. 13 No. 4, pp. 307-22. Cellier, F., Nebot, A., Mugica, F. and de Albornoz, A. (1995), “Combined qualitative-quantitative simulation methods of continuous-time process using fuzzy inductive reasoning techniques”, Int. J. General Systems, Vol. 24 No. 1/2, pp. 95-116. Chambless, B. and Scarborough, D. (2001), “Information-theoretic feature selection for a neural behavioral model”, Int. Joint Conference on Neural Nets, 14-19 July, Washington, DC. Chen, Z. (1994), “Qualitative reasoning for systems reconstruction using Lebesgue discretization”, Int. J. Systems Sci., Vol. 25 No. 12, pp. 2329-37. Chen, Z. (1997), “K-Systems theory for goal-driven data mining”, Advances in Systems and Applications, Special Issue 1, pp. 40-3. Conant, R.C. (1981), “Set-theoretic structure modeling”, Int. J. General Systems, Vol. 7, pp. 93-107. Conant, R.C. (1988a), “Extended dependency analysis of large systems. Part I: dynamic analysis”, Int. J. General Systems, Vol. 14, pp. 97-123. Conant, R.C. (1988b), “Extended dependency analysis of large systems. Part II: static analysis”, Int. J. General Systems, Vol. 14, pp. 125-41.

Davis, J.A. (1985), The Logic of Causal Order, Qantitative Application in the Social Sciences #55, Sage, Beverly Hills, CA. Dobransky, M. and Wierman, M. (1995), “Genetic algorithms: a search technique applied to behavior analysis”, Int. J. General Systems, Vol. 24 Nos 1/2, pp. 125-36. Elias, D. (1988), “The general systems problem solver: a framework for integrating systems methodologies”, PhD dissertation, Department of Systems Science, SUNY-Binghamton. Feldman, D. and Crutchfield, J. (1998), “Discovering noncritical organization: statistical mechanical, information theoretic, and computational views of patterns in one-dimensional spin systems”, Santa Fe Institute Working Paper, 98-04-026. Files, C. and Perkowski, M. (1998), “Multi-valued functional decomposition as a machine learning method”, Proc. ISMVL’98, May 1998. Good, I. (1963), “Maximum entropy for hypothesis formulation especially for multidimensional contingency tables”, Annals of Mathematical Statistics, Vol. 34, pp. 911-34. Grygiel, S. (2000), “Decomposition of relations as a new approach to constructive induction in machine learning and data mining”, Electrical Engineering, PhD dissertation, Portland State University, Portland, OR. Grygiel, S., Zwick, M. and Perkowski, M. (2004), “Multi-level decomposition of relations”, Kybernetes, Vol. 33 Nos. 5/6, pp. 948-961. Hagenaars, J.A. (1993), Loglinear Models with Latent Variables, Quantitative Application in the Social Sciences #94, Sage, Beverly Hills, CA. Hosseini, J., (1987), “Segment congruence analysis: an information theoretic approach”, Systems Sciences PhD dissertation, Portland State University, Portland, OR. Hosseini, J., Harmon, R. and Zwick, M. (1986), “Segment congruence analysis via information theory”, Proceedings, International Society for General Systems Research, May 1986, pp. G62-G77. Hosseini, J., Harmon, R. and Zwick, M. (1991), “An information theoretic framework for exploratory segmentation research”, Decision Sciences, Vol. 22, pp. 663-77. Johnson, M. and Zwick, M. (2000), “State-based reconstructability modeling for decision analysis”, in Allen, J.K. and Wilby, J.M. (Eds), Proceedings of The World Congress of the Systems Sciences and ISSS 2000, International Society for the Systems Sciences, Toronto, Canada. Jones, B. (1985a), “Reconstructability analysis for general function”, Int. J. General Systems, Vol. 11, pp. 133-42. Jones, B. (1985b), “Reconstructability considerations with arbitrary data”, Int. J. General Systems, Vol. 11, pp. 143-51. Jones, B. (1986), “K-Systems versus classical multivariate systems”, Int. J. General Systems, Vol. 12, pp. 1-6. Jones, B. (1989), “A program for reconstructability analysis”, Int. J. General Systems, Vol. 15, pp. 199-205. Kim, J. and Muller, C. (1978), Factor Analysis: Statistical Methods and Practical Issues, Quantitative Applications in the Social Sciences #14, Sage, Beverly Hills, CA. Klir, G. (1976), “Identification of generative structure in empirical data”, Int. J. General Systems, Vol. 3 No. 2, pp. 89-104. Klir, G. (1985), The Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. (1986), “Reconstructability analysis: an offspring of Ashby’s constraint theory”, Systems Research, Vol. 3 No. 4, pp. 267-71.

An overview

903

K 33,5/6

904

Klir, G. (Ed.) (1996). Int. J. General Systems, Special Issue on GSPS, 24 (1-2) (includes an RA bibliography). Klir, G. and Wierman, M.J. (1998), Uncertainty-Based Information: Variables of Generalized Information Theory, Physica-Verlag, New York, NY. Knoke, D. and Burke, P.J. (1980), Log-Linear Models, Quantitative Applications in the Social Sciences Monograph #20, Sage, Beverly Hills, CA. Krippendorff, K. (1981), “An algorithm for identifying structural models of multivariate data”, Int. J. General Systems, Vol. 7 No. 1, pp. 63-79. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Quantitative Application in the Social Sciences #62, Sage, Beverly Hills, CA. Ku, H. and Kullbacks, S. (1968), “Interaction in multidimensional contingency tables: an information theoretic approach”, Journal of Research of the National Bureau of Standards – Mathematical Sciences, Vol. 728 No. 3, pp. 159-99. Kullback, S. (1959), Information Theory and Statistics, Wiley, NewYork, NY. Lendaris, G. (1995), “Prestructuring ANNs via a priori knowledge”, Proceedings of SPIE International Conference, SPIE Press. Lendaris, G. and Mathia, K. (1994), “Using a priori knowledge to prestructure ANNs”, Australian Journal of Intelligent Information Systems, Vol. 1 No. 1. Lendaris, G., Rest, A. and Misley, T. (1997), “Improving ANN generalization using a priori knowledge to pre-structure ANNs”, Proceedings of International Conference on Neural Networks ’97 (ICNN’97), July 1997, IEEE press, New York, NY. Lendaris, G., Shannon, T. and Zwick, M. (1999), “Prestructuring neural networks for pattern recognition using extended dependency analysis (invited paper)”, Proceedings of Applications and Science of Computational Intelligence II AeroSense ’99, SPIE, Orlando, FL. Lendaris, G., Zwick, M. and Mathia, K. (1993), “On matching ANN structure to problem domain structure”, Proceedings of the World Congress on Neural Nets ’93, Portland, OR. Li, W. and Packard, N. (1990), “The structure of the elementary cellular automata rule space”, Complex Systems, Vol. 4, pp. 281-97. Long, J.S. (1983), Covariance structure Models: An Introduction to LISREL, Quantitative Application in the Social Sciences #34, Sage, Beverly Hills, CA. Mariano, M. (1984), “Towards resolving inconsistency among behavior systems”, in Smith, W. (Ed.), Systems Methodologies and Isomorphies, Proceedings – Society for General Systems Research International Conference 1984 Vol. I, Intersystems Publication, Lewiston, NY, pp. 225-9. Mariano, M. (1987), “Aspects of inconsistency in reconstruction analysis”, PhD dissertation, Department of Systems Science, SUNY-Binghamton. McCutcheon, A.L. (1987), Latent Class Analysis, Quantitative Applications in the Social Sciences #64, Sage, Beverly Hills, CA. Miller, G. and Madow, W. (1954), On the Maximum Likelihood Estimate of the Shannon-Wiener Measure of Information, Air Force Cambridge Research Center, Washington, DC. Mishchenko, A. (2001), “An Introduction to Zero-suppressed binary decision diagrams”, Technical Report June 2001, PSU, available at www.ee.pdx.edu/~/alanmi/research/dd/zddtut.pdf. Perkowski, M., Marek-Sadowska, M., Jozwiak, L., Luba, T., Grygiel, S., Nowicka, M., Malvi, R., Wang, Z. and Zhang, J. (1997), “Decomposition of many-valued relations”, Proc. ISMVL ’97, May 1997, pp. 13-18.

Pittarelli (1990), “Reconstructability analysis using probability intervals”, Int. J. General Systems, Vol. 16, pp. 215-33. Shaffer, G. (1988), “K-systems analysis for determining the factors influencing benthic microfloral productivity in a Louisiana estuary”, USA Mar. Ecol. Prog. Ser., Vol. 43, pp. 43-54. Shaffer, G. and Cahoon, P. (1987), “Extracting information from ecological data containing high spatial and temporal variability: benthic microfloral production”, Int. J. General Systems, Vol. 13, pp. 107-23. Shannon, T. and Zwick, M. (2004), “Directed extended dependency analysis for data mining”, Kybernetes, Vol. 33 Nos 5/6, pp. 973-983. Uyttenhove, H.J.J. (1984), “SAPS – a software systems for inductive modelling”, in Oren, T.I., et al. (Eds), Simulation and Model-Based Methodologies: An Integrative View, NATO ASI Series, F10, Springer-Verlag, Berlin, Heidelberg, pp. 427-49. Vermunt, J. (1997), Lem: A General Program for the Analysis of Categorical Data, (Program manual), Tilburg University. Walker, C.C. and Ashby, W.R. (1966), “On temporal characteristics of behavior in certain complex systems”, Kybernetes, Vol. 3 No. 2, pp. 100-8. Willett, K. and Zwick, M. (2004), “A software architecture for reconstructability analysis”, Kybernetes, Vol. 33 Nos 5/6, pp. 997-1008. Zadeh, L.A. (1965), “Fuzzy sets”, Information and Control, Vol. 8 No. 3, pp. 338-53. Zwick, M. (1996), “Control uniqueness in reconstructability analysis”, Int. J. General Systems, Vol. 24 Nos. 1-2, pp. 151-62. Zwick, M. (2001), “Wholes and parts in general systems methodology”, in Wanger, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY. Zwick, M. (2004a), “Discrete multivariate modeling”, available at: www.sysc.pdx.edu/res_struct. html Zwick, M. (2004b), “Reconstructability analysis with Fourier transforms”, Kybernetes, Vol. 33 No. 5/6, pp. 1026-1040. Zwick, M. and Johnson, M. (2004), “State-based reconstructability analysis”, Kybernetes, Vol. 33 No. 5/6, pp. 1041-1052. Zwick, M. and Mishchenko, A. (2004), “Binary decision diagrams and reconstructability analysis in crisp possibilistic systems”, (in preparation). Zwick, M. and Shervais, S. (2004), “Reconstructability analysis detection of optimal gene order in genetic algorithms”, Kybernetes, Vol. 33 No. 5/6, pp. 1053-1062. Zwick, M. and Shu, H. (1997), “Set-theoretic reconstructability of elementary cellular automata”, Advances in Systems Science and Applications, Special Issue 1, pp. 31-6. Zwick, M. and Shu, H. (2004), “Reconstructability and dynamics of elementary cellular automata”, (in preparation). Zwick, M. and Zeitler, E. (1973), “Image reconstruction from projections”, Optik, Vol. 38, pp. 550-65. Zwick, M., Shu, H. and Koch, R. (1996), “Information-theoretic reconstructability of rainfall time-series data”, Advances in Systems Science and Application, Special Issue I, pp. 154-9.

An overview

905

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

906

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Modified reconstructability analysis for many-valued functions and relations Anas N. Al-Rabadi and Martin Zwick Portland State University, Portland, USA Keywords Data analysis, Cybernetics, Boolean functions Abstract A novel many-valued decomposition within the framework of lossless reconstructability analysis (RA) is presented. In previous work, modified reconstructability analysis (MRA) was applied to Boolean functions, where it was shown that most Boolean functions not decomposable using conventional reconstructability analysis (CRA) are decomposable using MRA. Also, it was previously shown that whenever decomposition exists in both MRA and CRA, MRA yields simpler or equal complexity decompositions. In this paper, MRA is extended to many-valued logic functions, and logic structures that correspond to such decomposition are developed. It is shown that many-valued MRA can decompose many-valued functions when CRA fails to do so. Since real-life data are often many-valued, this new decomposition can be useful for machine learning and data mining. Many-valued MRA can also be applied for the decomposition of relations.

1. Introduction One general method to understand complex systems is to decompose the system in terms of less complex subsystems (Klir, 1985; Krippendorff, 1986). Full decomposition, as opposed to partial decomposition, consists of the determination of the minimal subsets of relations that describe the system acceptably. The quality of the decomposition is evaluated by calculating: (1) the amount of information (or, conversely, the loss of information, or error) which exists in the decomposed system; and (2) the complexity of the decomposed system. The objective is to decompose the complex system (data) into the least complex and most informative (least error) model. This paper is organized as follows: Section 2 presents a background on reconstructability analysis (RA) both conventional (CRA) and modified (MRA). Many-valued MRA decomposition is presented in Section 3. Conclusions and future work are included in Section 4.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 906-920 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410533967

2. Reconstructability analysis RA is a technique developed in the systems community to decompose relations or distributions involving qualitative variables (Conant, 1981; Klir, 1985; Krippendorff, 1986; Zwick, 1996). We are here concerned with lossless

decomposition of completely specified set-theoretic (crisp possibilistic) functions and relations. (We do not address information-theoretic, i.e. probabilistic, distributions.) In lossless RA decomposition, the aim is to obtain the simplest model of the data which has zero error. The models representing possible decompositions define a graph-based lattice of structures. A “model” is a structure applied to some data (here a set-theoretic relation). Each model is a set of sub-relations projected from the original relation and represented by look-up tables. New lossless RA-based decomposition, called MRA decomposition, has been introduced by Al-Rabadi (2001), Al-Rabadi et al. (2002). While CRA decomposes using all values of the function, MRA decomposes using: (1) the minimum set of values from which the function can be reconstructed without error; and (2) the simplest model (at the lowest level in the lattice of structures) for each value in the minimal set. The first principle is illustrated for Boolean functions as follows: for every structure in the lattice of structures, decompose the Boolean function for one value only, e.g. for value of “1”, into the simplest error-free decomposed structure. One thus obtains the 1-MRA decomposition. This model consists of a set of projections which when intersected yield the original Boolean function. This is illustrated in the following example. Example 1. For Boolean function: F ¼ x1 x2 þ x1 x3 Figure 1 shows the simplest models obtained using CRA and MRA decompositions. While CRA decomposes for both “0” and “1” values of the Boolean function, MRA decomposes only for value “1”, since Fðx1 ; x2 ; x3 Þ can be completely retrieved if one knows the ðx1 ; x2 ; x3 Þ values for which F ¼ 1: For Boolean functions, there are two advantages of MRA over CRA: (1) MRA decomposition is simpler than CRA decomposition, so the MRA algorithm needs less time and space for its computation; and (2) MRA directly implements the intersection operation with an AND gate in binary logic; consequently MRA decomposition leads directly to a binary circuit and thus can be applied to both machine learning and binary circuit design. On the other hand, the intersection operation in CRA requires ternary logic to accommodate “don’t cares” which are represented in top middle of Figure 1 by “ – ”. Therefore, CRA has no simple application in binary circuit design.

3. Many-valued modified reconstructability analysis This section presents MRA for many-valued functions and relations.

Many-valued functions

907

K 33,5/6

908

Figure 1. CRA versus MRA decompositions for the Boolean function: F ¼ x1x2 + x1x3

3.1 General approach Real-life data are in general many-valued. Consequently, if MRA can decompose relations between many-valued variables it can have practical applications in machine learning and data mining. Many-valued MRA is made up of two main steps which are common to two equivalent (intersection-based and union-based) algorithms: (1) partition the many-valued truth table into subtables, each contain only single functional value; and (2) perform CRA on all subtables. Figure 2 shows the general pre-processing procedure for the two many-valued MRA algorithms, which will be explained in more detail below. For an “n”-valued completely specified function, one needs ðn 2 1Þ values to define the function. We thus do all n decompositions and use for our MRA model the ðn 2 1Þ simplest of these. For example, using the lattice-of-structures, decompose the three-valued function for each individual value. One then obtains the simplest lossless MRA decomposition for value “0” of the function (denoted as the 0-MRA decomposition), for value “1” (1-MRA decomposition), and for value “2” (2-MRA decomposition). By selecting the simplest two models from these 0-MRA, 1-MRA, and 2-MRA decompositions, one can generate the complete function. In the intersection method, first the CRA decompositions are expanded to include the full set of variable and function values, and these “expanded” decompositions are then intersected to yield the original table.

Many-valued functions

909

Figure 2. Steps for many-valued MRA

Equivalently, one can use a union operation to generate the corresponding many-valued MRA as follows. (1) Decompose the original table (function or relation) into subtables for each output value: e.g. T ¼ T 0 < T 1 < T 2 for the corresponding output values O0, O1, and O2, respectively. (2) Do the three-valued CRA decomposition on each subtable. Let Mj be the decomposition of Tj. (3) The reconstructed function or relation (T*) is the union of all the subtable decompositions, n21

T* ¼ < M j ^ Oj ; j¼0

where ^ is the set-theoretic Cartesian product. The union procedure can also be done with ðn 2 1Þ decompositions. 3.2 Complete examples Following are two examples which illustrate many-valued MRA of three-valued functions. In the first example, MRA can decompose the function for only two values, and one has no choice but to use both in the MRA model. In the second example, the function is decomposable for all three of its values, and the two simplest decompositions are chosen to define the model.

K 33,5/6

In discussing the second example, we show that this approach is generalizable to set-theoretic relations, in addition to mappings. Example 2. We will generate the MRA decomposition for the ternary function specified by the ternary Marquand chart.

910

The following is the intersection algorithm for many-valued MRA for the ternary function in Example 2. Step 1. Decompose the ternary chart of the function into three separate tables each for a single function value. This will produce the following three sub-tables.

Step 2. Perform CRA for each sub-table.

Step 2a. The simplest error-free 0-MRA decomposition is the original “0”-sub-table itself since it is not decomposable. Step 2b. 1-MRA decomposition of D1 is as follows.

Many-valued functions

911

Step 2c. The 2-MRA decomposition of D2 is as follows:

3.2.1 The intersection algorithm. Step 3.1. Select the ð3 2 1 ¼ 2Þ simplest error-free decomposed models. In this example, these are 1-MRA and 2-MRA decompositions. MRA thus gives the decomposition model of D11 : D12 : D21 : D22 from which the original function can be reconstructed as follows. Step 3.2. Note that, for Tables 1 and 2, the MRA decomposition is for the value “1” of the logic function. Therefore, the existence of the tuples in the decomposed model implies that the function has value “1” for those tuples, and the non-existence of the tuples in the decomposed model implies that the function does not have value “1” but “0” or “2” for the non-appearing tuples. This is shown in Tables 10 and 20 , respectively. Similarly note that, for Tables 3 and 4, the MRA decomposition is for the value “2” of the logic function. Therefore, the existence of the tuples in the decomposed model implies that the function has value “2” for those tuples, and the non-existence of the tuples in the decomposed model implies that the function does not have value “2” but “0” or “1” for the non-appearing tuples. This is shown in Tables 30 and 40 respectively.

K 33,5/6

912

In Tables 10 and 20 (i.e. the decomposition for value “1” of the function), the existence of value “1” (of sub-relations F1 and F2) means that the value “1” appeared in the original non-decomposed function for the corresponding tuples that appear in each table, but does not imply that the values “0” or “2” (of sub-relations F1 and F2) did not exist in the original non-decomposed function for the same tuples. Therefore, “0” and “2” are added to “1” as allowed values. In the remaining tuples, however, only “0” and “2” are allowed since the value “1” did not occur. Similarly, in Tables 30 and 40 , the existence of the value “2” (of sub-relations F3 and F4) means that the value “2” appeared in the original non-decomposed function for the corresponding tuples that appear in each table, but does not imply that values “0” or “1” did not exist in the original non-decomposed function for the same tuples. Therefore, “0” and “1” are added to “2” as allowed values. In the remaining tuples, however, only “0” and “1” are allowed since the value “2” did not occur. Set-theoretically, obtaining Tables 10 -40 from Tables 1-4 is described as follows: Table 10 : ðD11 ^ ð0; 1; 2ÞÞ < ðD 011 ^ ð0; 2ÞÞ Table 20 : ðD12 ^ ð0; 1; 2ÞÞ < ðD 012 ^ ð0; 2ÞÞ Table 30 : ðD21 ^ ð0; 1; 2ÞÞ < ðD 021 ^ ð0; 1ÞÞ Table 40 : ðD22 ^ ð0; 1; 2ÞÞ < ðD 022 ^ ð0; 1ÞÞ where 0 here means complement. Step 3.3. Tables 10 -40 are used to obtain the block diagram in Figure 3, where the following set-theoretic equations govern the outputs of the levels in the circuit shown in the figure: F ¼ F5 > F6

F5 ¼ F1 > F2

F6 ¼ F3 > F4

where F1 is given by Table 10 , F2 by Table 20 , F3 by Table 30 , and F4 by Table 40 , respectively.

Many-valued functions

913 Figure 3. The decomposed structure resulting from the many-valued MRA decomposition

The intermediate subfunctions, F5 and F6 are shown in the following maps, respectively.

Note that in Figure 3, the intersection blocks in the second level and the intersection block at the third (output) level, are general and do not depend on the function being decomposed. Only the tables at the first level depend upon this function. 3.2.2. The union algorithm Steps 1 and 2 are the same as in the intersection algorithm. Step 3.1. Using the decomposition model D11 : D12 : D21 : D22 obtain D1 and D2 by standard methods as follows:

K 33,5/6

D1 ¼ ðD11 ^ x3 Þ > ðD12 ^ x1 Þ D2 ¼ ðD21 ^ x2 Þ > ðD22 ^ x1 Þ D0 ¼ ðD1 < D2 Þ0

914

where D1 is the decomposition for function value “1”, D2 for function value “2”, and x1, x2, and x3 [ {0; 1; 2}: Step 3.2. Perform the set-theoretic operations to obtain the total function from the decomposed sub-functions. x1 x2 x3 F ¼ ðD1 ^ 1Þ < ðD2 ^ 2Þ < ððD1 < D2 Þ0 ^ ð1 < 2Þ0 Þ ¼ ðD1 ^ 1Þ < ðD2 ^ 2Þ < ððD1 < D2 Þ0 ^ 0Þ Alternatively, one can use all three decompositions: x1 x2 x3 F ¼ ðD0 ^ 0Þ < ðD1 ^ 1Þ < ðD2 ^ 2Þ The function value of ðx1 ; x2 ; x3 Þ is determined by the block diagram of Figure 4, where G performs the following operation: F ¼ 0 if ðx1 x2 x3 Þ [ D0 F ¼ 1 if ðx1 x2 x3 Þ [ D1 F ¼ 2 if ðx1 x2 x3 Þ [ D2 Note that the logic function in Example 2 is non-decomposable using CRA. Consequently, as can be seen from this example and analogous to the binary case, the new many-valued MRA is superior to CRA. We now consider an example where CRA does decompose, and also where MRA decomposes for all three values. Example 3. Let us generate the MRA decomposition for the ternary function specified by the following ternary Marquand chart:

Figure 4. Block diagram for the union algorithm of MRA of Example 2

Many-valued functions

915

.Using the intersection-based algorithm, one obtain the following results for MRA for the ternary function in Example 3. Step 1. Decompose the ternary chart of the function into three separate tables each for a single function value. This will produce the following three sub-tables.

Step 2. Perform CRA for each sub table.

K 33,5/6

Step 2a. The 0-MRA decomposition of D0 is as follows:

916

Step 2b. The 1-MRA decomposition of D1 is as follows:

Step 2c. The 2-MRA decomposition of D2 is as follows:

3.2.3 The intersection algorithm. Step 3.1. Select the two simplest decomposed models, namely the 1-MRA and 2-MRA decompositions. These are at a lower level in the lattice of structures than 0-MRA. Step 3.2. Analogous to Example 2, one obtains the following expanded tables:

Many-valued functions

917

Set-theoretically, obtaining Tables 40 , 50 , 60 , and 70 From Tables 4, 5, 6, and 7 is described as follows: Table 40 : ðD11 ^ ð0; 1; 2ÞÞ < ðD 011 ^ ð0; 2ÞÞ Table 50 : ðD12 ^ ð0; 1; 2ÞÞ < ðD 012 ^ ð0; 2ÞÞ Table 60 : ðD21 ^ ð0; 1; 2ÞÞ < ðD 021 ^ ð0; 1ÞÞ Table 70 : ðD22 ^ ð0; 1; 2ÞÞ < ðD 022 ^ ð0; 1ÞÞ Step 3.3. Tables 40 , 50 , 60 , and 70 are used to obtain the block diagram in Figure 5, where the following set-theoretic equations govern the outputs of the levels in the circuit shown in the figure: F ¼ F5 > F6 F5 ¼ F1 > F2 F6 ¼ F3 > F4

Figure 5. The decomposed structure resulting from the many-valued MRA decomposition

K 33,5/6

where F1 is given by Table 40 , F2 by Table 50 , F3 by Table 60 , and F4 by Table 70 , respectively. The intermediate sub-functions, F5 and F6 are shown in the following maps, respectively.

918

3.2.4 The union algorithm. Steps 1 and 2 are the same as in the intersection algorithm. Step 3.1. Using the decomposition model D01 : D02 : D11 : D12 : D21 : D22 ; obtain D0, D1, and D2 by standard methods as follows: D0 ¼ ðD01 ^ x3 Þ > ðD02 ^ x1 Þ > ðD03 ^ x2 Þ D1 ¼ ðD11 ^ x3 Þ > ðD12 ^ x1 x2 Þ D2 ¼ ðD21 ^ x2 Þ > ðD22 ^ x1 Þ where D0 is the decomposition for function value “0”, D1 is for function value “1”, D2 for function value “2”, and x1, x2, and x3 [ {0; 1; 2}: Step 3.2. Perform the set-theoretic operations to obtain the total function from the decomposed subfunctions. This can be done using only two of the three decompositions as in Step (3.2) of the union algorithm in Example 2, or alternatively, one can use all three decompositions as follows:

x1 x2 x3 F ¼ ðD0 ^ 0Þ < ðD1 ^ 1Þ < ðD2 ^ 2Þ

Many-valued functions

The function value of ðx1 ; x2 ; x3 Þ is determined by the block diagram of Figure 6, where G performs the following operation:

919 F ¼ 0 if ðx1 x2 x3 Þ [ D0 F ¼ 1 if ðx1 x2 x3 Þ [ D1 F ¼ 2 if ðx1 x2 x3 Þ [ D2 The logic function in Example 3 is decomposable using CRA with the lossless CRA model x1 x2 : x2 x3 : x1 x3 : Consequently, unlike the previous example, both many-valued MRA and CRA decompose losslessly. Since both CRA and MRA decompose this function, we would be able to compare the complexities of the two decompositions. The complexity measure reported by Al-Rabadi et al. (2002) could be used, but needs to be extended to many-valued functions. From the previous discussion, it follows that the extension of many-valued MRA from functions to relations is trivial. One just Performs the union algorithm using all n decompositions, e.g. for three values ðD0 ^ 0Þ < ðD1 ^ 1Þ < ðD2 ^ 2Þ:

4. Conclusion A novel many-valued, decomposition within the framework of RA is presented. In previous work (Al-Rabadi, 2001, Al-Rabadi et al., 2002) MRA was applied to Boolean functions. In this paper, MRA is extended to many-valued logic functions and relations. It has been shown that MRA can decompose many-valued functions when CRA fails to do so. Since real-life data are naturally many-valued, future work will apply many-valued MRA to real-life data for machine learning, data mining, and data analysis.

Figure 6. Block diagram for the union algorithm of MRA of Example 3

K 33,5/6

920

References Al-Rabadi, A.N. (2001), “A novel reconstructability analysis for the decomposition of Boolean functions”, Technical Report #200l/005, 1 July 2001, Electrical and Computer Engineering Department, Portland State University, Portland, OR. Al-Rabadi, A.N., Zwick, M. and Perkowski, M. (2002), “A comparison of enhanced reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions”, Accepted to WOSC/IGSS 2002 Conference. Conant, R. (1981), “Set-theoretic structural modeling”, International Journal of General Systems, Vol. 7, pp. 93-107. Klir, G. (l985), Architecture of Systems Problem Solving, Plenum Press, New York, NY. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Sage Publications, New York, NY. Zwick, M. (1996), “Control uniqueness in reconstructability analysis”, International Journal of General Systems, Vol. 24 Nos. 1-2, pp. 151-62. Further reading Hurst, S.L. (1978), Logical processing of Digital Signals, Crane Russak and Edward Arnold, London and Basel. Klir, G. and Wierman, M.J. (1998), Uncertainty-Based Information: Variables of Generalized Information Theory, Physica-Verlag, New York, NY. Zwick, M. (2001), “Wholes and parts in general systems methodology”, in Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Reversible modified reconstructability analysis of Boolean circuits and its quantum computation

Boolean circuits

921

Anas N. Al-Rabadi ECE Department, Portland State University, Portland, Oregon, USA

Martin Zwick Portland State University, Portland, Oregon, USA Keywords Cybernetics, Boolean functions, Logic Abstract Modified reconstructability analysis (MRA) can be realized reversibly by utilizing Boolean reversible (3,3) logic gates that are universal in two arguments. The quantum computation of the reversible MRA circuits is also introduced. The reversible MRA transformations are given a quantum form by using the normal matrix representation of such gates. The MRA-based quantum decomposition may play an important role in the synthesis of logic structures using future technologies that consume less power and occupy less space.

1. Introduction Decomposition is one methodology to analyze data and identify “hidden” relationships between variables. One major decomposition technique for discrete static or dynamic systems is reconstructability analysis (RA), which is developed in the systems community to analyze qualitative data (Klir, 1985; Krippendorff, 1986). A recent short review of RA is given by Zwick (2001). Logic circuits that realize RA have also been shown (Zwick, 1995). This paper develops a methodology for reversible and quantum implementation of RA. Owing to the anticipated failure of Moore’s law around the year 2020, quantum computing may play an important role in building more compact and less power consuming computers (Nielsen and Chuang, 2000). Because all quantum computer gates must be reversible (Bennett, 1973; Fredkin and Toffoli, 1982; Landauer, 1961; Nielsen and Chuang, 2000), reversible computing will also be increasingly important in the future design of regular, minimal-size, and universal systems. The remainder of this paper is organized as follows: a review of our new approach to RA decomposition of logic functions is presented in Section 2. Background on reversible logic and the reversible realization of RA-based Boolean circuits is presented in Section 3. The implementation of reversible Boolean RA-based circuits using quantum logic is introduced in Section 4. A more expanded complete discussion of quantum computing is given in Section 5. Conclusions and future work are included in Section 6.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 921-932 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410533976

K 33,5/6

922

2. Reconstructability analysis: conventional versus modified We are concerned here with “set-theoretic” RA, i.e. the analysis of crisp possibilistic systems (Klir and Wierman, 1998). Enhancement of lossless set-theoretic conventional reconstructability analysis (CRA) has been presented by Al-Rabadi (2001) and Al-Rabadi et al. (2004). This new enhanced RA is called “modified reconstructability analysis” (MRA). The procedure for the lossless MRA decomposition is as follows: for every structure in the lattice of structures, decompose the Boolean function for one functional value only (e.g. for value of “1”) into the simplest error-free decomposed structure. One thus obtains the 1-MRA decomposition. This model consists of a set of projections which when intersected yield the original Boolean function. It has been shown by Al-Rabadi et al. (2004) that lossless MRA yields much simpler logic circuits than the corresponding lossless CRA, while retaining all information about the decomposed logic function. Figure 1 from Al-Rabadi et al. (2004) shows the decomposition of all non-degenerate NPN-classes (Hurst, 1978) of three-variable Boolean functions. 3. Reversible MRA A (k, k) reversible circuit is a circuit that has the same number of inputs (k), and outputs (k), and is a one-to-one mapping between a vector of inputs and a vector of outputs. Thus, the vector of input states can always be uniquely reconstructed from the vector of output states (Bennett, 1973; Fredkin and Toffoli, 1982; Kerntopf, 2000; Landauer, 1961). As it was proven (Landauer, 1961) it is a necessary, but not sufficient condition for not dissipating power in a physical circuit that all sub-circuits must be built using reversible logical components. Many reversible gates have been proposed as building blocks for reversible computing (Kerntopf, 2000; Nielsen and Chuang, 2000). Figure 2 shows some of the gates that are commonly used in the synthesis of reversible Boolean logic circuits. It has been shown by Fredkin and Toffoli (1982) that for a (k,k) reversible gate to be universal the gate should have at least three inputs (i.e. (3, 3) gate). (A gate is universal if it can implement all functions for a given number of arguments.) One should note that not all (3, 3) reversible gates are universal, but each universal reversible gate has at least to be a (3, 3) gate. Boolean reversible (3, 3) gates which are universal in two arguments have been shown by Kerntopf (2000). Reversible (3, 3) gates, that are universal in two arguments, can be used for the construction of reversible MRA circuits. Figure 3 shows one example of a binary (3, 3) reversible gate which is universal in two arguments. The following example illustrates the use of the reversible gate in Figure 3 for the synthesis of 1-MRA circuit for class 5 from Figure 1. The 1-MRA decomposed Boolean circuit of class 5 in Figure 1 can be realized using the binary (3, 3) reversible circuit in Figure 3(b). This is done with the reversible circuit shown in Figure 4, where blocks B1 and B2 are the reversible (3, 3) gate

Boolean circuits

923

Figure 1. CRA versus MRA for the decomposition of all non-degenerate NPN-classes of three-variable Boolean functions

Figure 2. Binary reversible gates: (a) (2, 2) Feynman gate which uses XOR; (b) (3, 3) Toffoli gate which uses AND and XOR; and (c) (2, 2) swap gate which is two permuted wires

K 33,5/6

924

Figure 3. (a) Diagram of the reversible (3, 3) Boolean logic circuit; (b) truth table of this gate; and (c) proof of universality of the gate in two arguments

Figure 4. Reversible (7,7) Boolean circuit that implements the 1-MRA circuit from class 5 in Figure 1

from Figure 3(b), and block B3 is the reversible (3, 3) gate from Figure 2(b). For B3, c¼0 and thus, B3 is a reversible logic AND gate. Using Figure 3(c), the Boolean reversible circuit in Figure 4 implements the 1-MRA circuit of class 5 (in Figure 1) using the following input settings: a ¼ 0 ) Q1 ¼ f 1 ¼ ðx1 %x2 Þ0 a ¼ 0 ) Q2 ¼ f 2 ¼ ðx1 %x3 Þ0 F ¼ Q1 ^ Q2 ¼ f 1 ^ f 2 ¼ ðx1 %x2 Þ0 ^ ðx1 %x3 Þ0 ¼ x1 x2 x3 þ x1 0 x2 0 x3 0

For block B3, in Figure 4, one could alternatively use the gate described in Figure 3(b): for c ¼ 0 output R is the logical AND; in this case, the reversible circuit is fully regular (i.e. made up of only one kind of gate). However, using the Toffoli gate (Figure 2(b)) for B3 is less complex; in this case, the circuit is semi-regular (i.e. all the gates in the first level are the same, but the AND of the second level is done by a different gate). Using similar substitutions with appropriate input values according to Figure 3(b), the reversible circuit in Figure 4 can realize all 1-MRA circuits from classes 8 and 10 in Figure 1, respectively. The remaining classes from Figure 1 can be realized using analogous techniques, by adding one more block from Figure 3(b) to the first level of Figure 4 in the case of class 1, and removing one block from the first level of Figure 4 in the case of classes 4 and 7, respectively. 4. Quantum MRA Quantum computing is a recent trend in logic computation that utilizes the atomic structures to perform the logic computation processes (Nielsen and Chuang, 2000). Although the underlying principles for quantum computing are the theorems and principles of quantum mechanics (Dirac, 1930), it has been shown (Nielsen and Chuang, 2000) that the physical quantum evolution processes can be reduced to algebraic matrix equations. Such matrix representation is a pure mathematical representation that can be realized physically using the corresponding quantum devices. Figure 5 shows this matrix formalism, where each evolution matrix is unitary (Nielsen and Chuang, 2000). Each matrix representation shown in Figure 5 is obtained through the solution of a set of linearly independent equations that correspond to the mapping of input vector to an output vector. In Figure 5, the matrix representation is equivalent to the input-output (I/O) mapping representation of quantum gates, as follows. If one considers each row in the input side of the I/O map in Figure 5 as an input vector represented by the natural binary code of 2index with row index starting from 0, and similarly for the output row of the I/O map, then the matrix transforms the input vector to the corresponding output vector by transforming the code for the input to the code for the output. For example, the following matrix equation is the I/O mapping using the Feynman matrix from Figure 5(a): ½Feynman 2 1 0 6 60 1 6 6 60 0 4 0 0

matrix ½input 32 1 0 0 0 76 6 0 07 760 1 76 0 1760 0 54 0 0 1 0

code ¼ ½output 3 2 1 0 0 7 6 6 0 07 7 60 7¼6 1 07 60 5 4 0 0 1

code 0

0

1

0

0

0

0

1

0

3

7 07 7 7 17 5 0

Boolean circuits

925

K 33,5/6

926

Figure 5. I/O mapping and matrix representations of quantum gates: (a) (2, 2) Feynman gate; (b) (2, 2) Swap gate; and (c) (3, 3) Toffoli gate

One notes from this example that the Feynman gate, and similarly all quantum gates shown in Figure 5, are merely permuters, i.e. they produce output vectors which are permutations of the input vectors. Figure 6 shows the quantum evolution matrices for blocks B1 (also B2) and B3 in Figure 4, respectively. 5. Quantum computing Although the gates in Figure 5 are merely permuters, not all quantum gates do simple permutations (Nielsen and Chuang, 2000). The mapping of a set of inputs into any set of outputs in Figure 4 can be obtained in general using quantum computing. The following discussion explains the general principles of quantum computing, and we follow the standard notation that is used in quantum mechanics from Dirac (1930). Definition 1. A binary quantum bit, or qubit, is a binary quantum system, defined over the Hilbert space H2 with a fixed basis {j0l,j1 l}. Definition 2. In binary quantum logic system, qubit-0 and qubit-1 are defined as follows: Figure 6. Quantum transformations for the reversible (7,7) circuit in Figure 4: (a) input mapping block B1 (also B2); and (b) output mapping block B3

2 3 1 4 5; qubit  0 ¼ j0l ¼ 0 2 3 0 4 5: qubit  1 ¼ j1l ¼ 1

Boolean circuits

927

Figure 7 shows the process of evolving the input binary qubits using the corresponding quantum circuits. Let us evolve the input binary 2 3 0 " # " # 6 7 607 0 0 6 7 ¼ 6 7; ^ qubitj11l ¼ 607 1 1 4 5 1 where the tensor product ^ gives the corresponding binary natural code, using the serially interconnected quantum circuit in Figure 7(a), which is composed of a serial interconnection of two Feynman gates (Figure 2(a)) connected by a swap gate (Figure 2(c)). The evolution of the input qubit can be viewed in two equivalent perspectives. One perspective is to evolve the input qubit step-by-step using the serially interconnected gates. The second perspective is to evolve the input qubit using the total quantum circuit at once, since the total evolution transformation [Mnet] is equal to the multiplication of the individual evolution matrices [Mq ] that correspond to the individual quantum primitives: Y [ ½ M net serial ¼ ½M q : q

Figure 7. Quantum logic circuits

K 33,5/6

928

Perspective #1: 2 1 0 6 60 1 6 6 60 0 4 0 0

2 32 3 2 3 1 0 0 6 76 7 6 7 60 6 7 6 7 0 07 6 7607 607 76 7 ¼ 6 7 ) 6 60 0 17607 617 4 54 5 4 5 0 0 1 0 1 2

1 0

6 60 1 6 )6 60 0 4 0 0 Perspective #2: 02 1 0 B6 B6 0 1 B6 B6 B6 0 0 @4 0 0

0

0 0

0 0 0 1

0

32

1 0

76 6 07 760 0 76 1760 1 54 0 0 0

0 1 0 0

0

0 0 0 1

32

1

76 6 07 760 76 0760 54 0 1

0 1 0

32 3 2 3 0 0 76 7 6 7 6 7 6 7 1 07 7607 617 76 7 ¼ 6 7 0 07617 607 54 5 4 5 0 0 0 1 0 0

32 3 2 3 0 0 76 7 6 7 6 7 6 7 07 7617 617 76 7 ¼ 6 7 17607 607 54 5 4 5 0 0 0 0

0

0

1

0

0

0

0

1

31 2 3 2 3 0 0 7C 6 7 6 7 C6 7 6 7 07 7C 6 0 7 6 1 7 7C 6 7 ¼ 6 7 1 7C 6 0 7 6 0 7 5A 4 5 4 5 0 0 1 0

Thus, the quantum circuit shown in Figure 7(a) evolves the qubit j11l into the qubit j01l. The quantum circuit in Figure 7(b) is composed of a serial interconnect of two parallel circuits as follows: dashed boxes ((1), (2)) and ((3), (4)) are parallel-interconnected, and dotted boxes (5) and (6) are serially interconnected. The total evolution transformation [Mnet] of the total parallel-interconnected quantum circuit is equal to the tensor (Kronecker) product of the individual evolution matrices [Mq ] that correspond to the individual quantum primitives:   [ ½ M net parallel ¼ ^ M q Thus, analogous to the operations of the circuit in Figure 7(a), the evolution of the input qubit, in Figure 7(b), can be viewed in two equivalent perspectives, respectively. One perspective is to evolve the input qubit stage-by-stage. The second perspective is to evolve the input qubit using the total quantum circuit at once. Let us evolve the input binary qubit j111l using the quantum circuit in Figure 7(b). The evolution of matrices of the parallel-interconnected dashed boxes in (5) and (6) are as follows (where the symbol k means parallel connection):

input ¼ j1l ^ j1l ^ j1l ¼

! ! ! 0 0 0 ^ ^ 1 1 1

 ¼ 0 0 0 0 0 0 The evolution matrix for ð5Þ ¼ ð1Þjjð2Þ is: 3 2 1 0 0 0 " # 1 0 60 1 0 07 7 Feyman ^ wire ¼ 6 4 0 0 0 1 5^ 0 1 0 0 1 0 2

1 0 6 0 1 6 6 6 6 6 ¼6 6 6 6 6 4

1

66 660 66 66 660 64 6 6 0 6 ¼6 6 6 6 6 6 6 6 6 4

0

1

T

929

3

 

1 0

0 1

  

The evolution matrix for ð6Þ ¼ ð3Þjjð4Þ is: 2 1 " # 6 60 1 0 6 Wire ^ swap ¼ ^6 60 0 1 4 0 22

Boolean circuits

0

0

0

1

1

0

0

0

0

0

0

1

1

0

0

0

1 0

0

0 1



1 0

7 7 7 7 7 7 0 7 7 1 7 7 7 5

3

7 07 7 7 07 5 1

3

3

7 07 7 7 07 5 1

7 7 7 7 7 7 7 7 7 37 0 7 7 77 7 077 7 77 077 57 5 1

0

2

1 6 60 6 6 60 4 0

0

0

0

1

1

0

0

0

K 33,5/6

930

Perspective #1: input ) ð5Þ ) output1 ; input2 ð¼ output1 Þ ) ð6Þ ) output2  3 2 2 3 2 3 1 0 0 0 1 7 0 6   7607 607 6 1 0 7607 607 6 76 7 6 7 6 0 1 7607 607 6   6 7 6 7 6 1 0 7 7 6 0 7 ¼ 6 0 7; 6 7 6 7 7 6 0 1 76 607 617 6   5 405 4 7 6 1 0 5 0 4 0 1 0 1 22

1 660 640 6 6 0 6 6 6 6 6 4

0 0 1 0

0 1 0 0

3 0 07 05 1

2

1 60 40 0

0 0 1 0

32 3 2 3 0 0 7607 607 76 7 6 7 7607 607 7 07 607 376 6 7 ¼ 6 7 ¼ j110l 0 7 07 607 76 6 7 0776 17 7 607 6 7 5 415 5 4 0 5 0 1 0 0

0 1 0 0

Perspective #2: input ) ðð6Þð5ÞÞ ) output2 02 2

1 B6 6 0 B6 6 B6 6 0 B6 4 B6 B6 0 B6 B6 B6 B6 B6 B6 B6 @4

0 0 1 0

0 1 0 0

3 0 07 7 07 5 1

32

2

1 60 6 60 4 0

2 3 0 607 6 7 607 6 7 607 6 7 ¼ 6 7 ¼ j110l 607 6 7 607 6 7 415 0

0 0 1 0

0 1 0 0

1 0 76 0 1 76 76 76 76 76 6 37 6 0 7 76 76 07 6 77 76 7 0576 54 1



31

2 3 0 7C 7C 6 0 7   7C 6 7 1 0 7C 6 0 7 7C 6 7 7C 6 0 7 0 1 C6 7  7 C6 7 1 0 7 7C 6 0 7 C6 7 0 1 7 7C 6 0 7 7C 6 7   7C 4 0 5 1 0 5A 0 1 1

Thus, the quantum circuit shown in Figure 7(b) evolves the qubit j111l into the qubit j110l. By applying this formalism to the quantum matrices from Figure 6, the reversible MRA circuit of Figure 4 is represented compactly by the following transformations: ½B1  jax2 x1 l ¼ jR1 P 1 f 1 l

ð1Þ

½B2  jx1 x3 al ¼ j f 2 P 2 R2 l

ð2Þ

½B3  j f 1 0f 2 l ¼ jG1 FG2 l

ð3Þ

where in equation (3), the qubit j0l is used to generate the AND operation in block B3 (Toffoli gate from Figure 2(b)) in Figure 4, and jabgl ¼ jal ^ jbl ^ jgl; where a, b, and g are single binary qubits. 6. Conclusions and future work Reversible realization of MRA decomposition and its quantum computation are presented. A comprehensive treatment of reversible MRA and its quantum computing with supplementary materials is provided by Al-Rabadi (2002). Future work will involve the investigation of other possible reversible realizations of binary and multiple-valued MRA decompositions of logic circuits and their corresponding quantum computations. The use of the natural parallelism of quantum entanglement for the realization of MRA-based circuits will also be investigated. References Al-Rabadi, A.N. (2001), “A novel reconstructability analysis for the decomposition of Boolean functions”, Technical Report #2001/005, 1 July 2001, Electrical and Computer Engineering Department, Portland State University, Portland, Oregon. Al-Rabadi, A.N. (2002), “Novel methods for reversible logic synthesis and their application to quantum computing”, PhD dissertation, Portland State University, Portland, Oregon. Al-Rabadi, A.N., Zwick, M. and Perkowski, M. (2004), “A comparison of enhanced reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions”, Kybernetes, Vol. 33 Nos. 5-6, pp. 933-947. Bennett, C. (1973), “Logical reversibility of computation”, IBM Journal of Research and Development, Vol. 17, pp. 525-32. Dirac, P. (1930), The Principles of Quantum Mechanics, 1st ed., Oxford University Press, Oxford. Fredkin, E. and Toffoli, T. (1982), “Conservative logic”, International Journal of Theoretical Physics, Vol. 21, pp. 219-53. Hurst, S.L. (1978), Logical Processing of Digital Signals, Crane Russak and Edward Arnold, London and Basel. Kerntopf, P. (2000), “A comparison of logical efficiency of reversible and conventional gates”, Proc. on 3rd Symposium on Logic, Design and Learning, Portland, Oregon.

Boolean circuits

931

K 33,5/6

932

Klir, G. (1985), Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. and Wierman, M.J. (1998), Uncertainty-Based Information: Variables of Generalized Information Theory, Physica-Verlag, New York, NY. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Sage Publications, New York, NY. Landauer, R. (1961), “Irreversibility and heat generation in the computational process”, IBM Journal of Research and Development, Vol. 5, pp. 183-91. Nielsen, M. and Chuang, I. (2000), Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, MA. Zwick, M. (1995), “Control uniqueness in reconstructability analysis”, International Journal of General Systems, Vol. 23 No. 2. Zwick, M. (2001), “Wholes and parts in general systems methodology”, in Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

A comparison of modified reconstructability analysis and Ashenhurst-Curtis decomposition of Boolean functions

Boolean functions

933

Anas N. Al-Rabadi and Marek Perkowski ECE Department, Portland State University, Portland, Oregon, USA

Martin Zwick Systems Science Department, Portland State University, Portland, Oregon, USA Keywords Cybernetics, Boolean functions, Complexity theory Abstract Modified reconstructability analysis (MRA), a novel decomposition technique within the framework of set-theoretic (crisp possibilistic) reconstructability analysis, is applied to three-variable NPN-classified Boolean functions. MRA is superior to conventional reconstructability analysis, i.e. it decomposes more NPN functions. MRA is compared to Ashenhurst-Curtis (AC) decomposition using two different complexity measures: log-functionality, a measure suitable for machine learning, and the count of the total number of two-input gates, a measure suitable for circuit design. MRA is superior to AC using the first of these measures, and is comparable to, but different from AC, using the second.

1. Introduction One general methodology for understanding a complex system is to decompose it into less complex sub-systems. Decomposition is used in many situations; for example, in logic synthesis (Ashenhurst, 1953, 1956, 1959; Curtis, 1962; 1963a, b; Files, 2000; Grygiel, 2000; Jozwiak, 1995; Muroga, 1979) where the number of inputs to the gates is high and cannot be mapped to a standard library and in machine learning where data are noisy or incomplete (Files, 2000; Grygiel, 2000). The primary criteria for evaluating the quality of the decomposition process are the amount of information (or loss of information, i.e. error) existing in the decomposed system and the complexity of this decomposed system. The objective is obvious: decompose the complex system (data) into the least-complex most-informative ( least-error) model. Simplicity is desired since, according to the Occam Razor principle, the simpler the model is, the more powerful it is for generalization. Least error is desired since one wants to retain as much information as possible in the decomposed system, when compared to the original data. The decomposition processes can be generally dichotomized into lossless (no error) versus lossy decomposition. In this paper, a comparison of

Kybernetes Vol. 33 No. 5/6, 2004 pp. 933-947 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410533985

K 33,5/6

934

three types of lossless decomposition are considered: the disjoint Ashenhurst-Curtis (AC) decomposition and set-theoretic conventional and modified reconstructability analysis (CRA and MRA, respectively). The remainder of this paper is organized as follows. Section 2 presents background and related work on this subject. CRA, MRA, and AC complexity results are presented in Section 3. Conclusions and future work are discussed in Section 4. 2. Logic functions classification, complexity measures, and decompositions This section introduces the basic background of the NPN-classification of three-variable, two-valued logic functions, AC and reconstructability analysis (RA) decomposition methods that are used in this work, and complexity measures that are utilized to compare the efficiency of such decompositions. 2.1 NPN-classification of logic functions There exist many classification methods to cluster logic functions into families of functions (Muroga, 1979). Two important operations that produce equivalence classes of logic functions are negation and permutation (Muroga, 1979). Accordingly, the following classification types result. (1) P-equivalence class. A family of identical functions obtained by the operation of permutation of variables. (2) NP-equivalence class. A family of identical functions obtained by the operations of negation or permutation of one or more variables. (3) NPN-equivalence class. A family of identical functions obtained by the operations of negation or permutation of one or more variables, and also negation of function. The NPN-equivalence classification will be used in this work. Table I lists three-variable Boolean functions, for the non-degenerate classes (i.e. the classes depending on all three variables).

Table I. NPN-equivalence classes for non-degenerate Boolean functions of three binary variables (Muroga, 1979). These classes contain 218 out of the possible 256 functions

Class

Representative function

1 2 3 4 5 6 7 8 9 10

F F F F F F F F F F

¼ x1 x2 þ x2 x3 þ x1 x3 ¼ x1 % x2 % x3 ¼ x1 þ x2 þ x3 ¼ x1 ðx2 þ x3 Þ ¼ x1 x2 x3 þ x 01 x 02 x 03 ¼ x 01 x2 x3 þ x1 x 02 þ x1 x 03 ¼ x1 ðx2 x3 þ x 02 x 03 Þ ¼ x1 x2 þ x2 x3 þ x 01 x3 ¼ x 01 x2 x3 þ x1 x 02 x3 þ x1 x2 x 03 ¼ x1 x 02 x 03 þ x2 x3

Number of functions 8 2 16 48 8 24 24 24 16 48

2.2 Complexity measures Decomposability means complexity reduction. Many complexity measures exist for the purpose of evaluating the efficiency of the decomposition of complex systems into simpler sub-systems. Such complexity measures include: the Cardinality complexity measure (DFC) (Abu-Mostafa, 1988), log-functionality (LF) complexity measure (Grygiel, 2000), and the sigma complexity measure (Zwick, 1995). In the first two measures, complexity is a count of the total number of possible functions realizable by all of the sub-blocks; the third just indicates the level of decomposition in the lattice of possible structures. The complexity of the decomposed structure is always less or equal to the complexity of the original look-up-table (LUT) that represents the mapping of the non-decomposed structure. That is, if a “decomposed” structure has higher complexity than the original structure, then the original structure is said to be non-decomposable. Although the DFC measure is easier and more familiar, LF is a better measure because it deals with non-disjoint systems (Grygiel, 2000). Also, DFC does not correct for function repetition (redundancy). Consequently, the LF measure will be used in this paper. The DFC and LF complexity measures are shown using Figure 1, which exemplifies AC decomposition. In Figure 1, for the first block, the3 total number of possible functions for three two-valued input variables is 22 ¼ 256: Also, for the second block, the total number of possible functions is similarly 256. The total possible number of functions for the whole structure is equal to 256 £ 256 ¼ 65;536: The DFC measure is defined as: DFC ¼ O · 2I C DFC ¼

X DFCn

Boolean functions

935

ð1Þ ð2Þ

n

where O is the number of outputs to a block, I is the number of inputs to the same block, equation (1) is the complexity for every block, and equation (2) is the complexity for the total decomposed structure. For instance, the DFC for Figure 1 is: C DFC ¼ 1 £ 23 þ 1 £ 23 ¼ log2 ð65; 536Þ ¼ 16; which is the same as the cardinality of the LUT. It was shown in the work of Grygiel (2000) that, for Figure 1, the LF complexity measure (CLF) for Boolean functions can be expressed as follows:

Figure 1. Generic non-disjoint decomposition

K 33,5/6

C LF ¼ log2 ðC F Þ

ð3Þ

where C F ¼ ðC 0F ÞPx3 pY 1 21

X

0 F

C ¼

936

i¼0



   pX 2 pX 1 P pY 2 ; pY 1 2 i S ; pY 1 2 i pX 3 pX 3 Pðn; kÞ ¼

k 1X ð21Þi Sðn; kÞ ¼ k! i¼0

X 1 ¼ fx1 ; x2 ; x3 g; pX 1 ¼

Y

!

k

ðk 2 iÞn ;

i

X 2 ¼ fx1 ; x4 g;

jxi j;

Y jxi j;

pX 2 ¼

xi [X 1

pY 1 ¼

n! ; ðn 2 kÞ! k

pY 2 ¼

yi [Y 1

¼

i

k! i!ðk 2 iÞ!0

X 3 ¼ X 1 > X 2 ¼ fx1 g pX 3 ¼

xi [X 2

Y j yi j;

!

Y jxi j;

xi [X 3

Y j yi j

yi [Y 2

where X1 is the set of input variables to the first block, X2 is the set of input variables to the second block, X3 is the set of overlapping variables between sets X1 and X2, PXi is the product of cardinalities of the input variables in set Xi, and PYi is the product of cardinalities of output variables in set Yi. For example, the LF for Figure 1 is: X 1 ¼ fx1 ; x2 ; x3 g;

X 2 ¼ fx1 ; x4 g;

[ pX 1 ¼ 2 £ 2 £ 2 ¼ 8; pY 1 ¼ 2;

X 3 ¼ X 1 > X 2 ¼ fx1 g

pX2 ¼ 2 £ 2 ¼ 4;

pX3 ¼ 2;

pY 2 ¼ 2;

C 0F ¼

1 X

Pð22 ; 2 2 iÞSð4; 2 2 iÞ ¼ 88

i¼0

[ C F ¼ 7;744 ) C LF ¼ log2 ð7;744Þ ¼ 12:92: Note that using the DFC measure (16) we would not consider Figure 1 to achieve any complexity reduction (i.e. successful decomposition), but using the LF (12.92), Figure 1 does achieve complexity reduction.

Figure 1 shows a four-input function, where the variable sets for the first and second blocks are not disjoint. In this paper, we are concerned with three-input functions, and in this case an AC decomposition, which is successful using the LF measure, results in a structure shown in Figure 2. Note that the variable sets for the two blocks with outputs g and F are necessarily disjoint, because if the two blocks shared one input variable, F would have three inputs and the decomposed structure would be more complex than the original non-decomposed three-input function. Example 1. The LF complexity measure of the structure in Figure 2 is obtained as follows. 2 Each sub-block in Figure 2 has a total of 22 ¼ 16 possible Boolean functions. Figure 3 shows all of the possible 16 two-variable Boolean functions per sub-block in Figure 2. By allowing g and F in Figure 2 to take on all possible maps from Figure 3, one obtains the following count of total non-repeated (irredundant) three-variable functions, as follows: C F ¼ 88 ) C LF ¼ 6:5: This answer agrees with the result of equation (3) (Grygiel, 2000). Example 2. For three-variable functions, RA produces four different types of decomposition structures, two of which are shown in Figure 4. (See also Table III under “Simplest Modified RA Circuit”.)

Boolean functions

937

Figure 2. A decomposed structure

Figure 3. Maps of all 16 possible Boolean functions of two variables (the single quote means negation)

K 33,5/6

938

The LF complexity measure for the structures shown in Figure 4, is obtained as follows. Figure 5 shows a tree that generates all possible functions for the structures in Figure 4(a) and (b) respectively (superscripts of functions denote the specific edge between two nodes in the tree). Utilizing this methodology of removing redundant functions, one obtains the following results for LF: for Figure 4(a), the total number of irredundant sub-functions at level 2 is C F ¼ 100 ) [ C LF ¼ log2 ð100Þ ¼ 6:6; and for Figure 4(b), the total number of irredundant sub-functions at level 3 is C F ¼ 152 ) [ C LF ¼ log2 ð152Þ ¼ 7:2:

2.3 AC decomposition AC decomposition (Ashenhurst, 1953, 1956, 1959; Curtis, 1962, 1963a, b; Files, 2000, Grygiel, 2000) is one of the major techniques for the decomposition of functions commonly used in the field of logic synthesis. The main idea of AC decomposition is to decompose logic functions into simpler logic blocks using the compression of the number of cofactors in the corresponding representation. This compression is achieved through exploiting the logical compatibility (i.e. redundancy) of cofactors (i.e. column multiplicity). As a result of AC decomposition, intermediate constructs (latent variables) are created. A general algorithm of the AC decomposition utilizing Karnaugh map (K-map) representation (Muroga, 1979) for instance, is as follows. (1) Partition the input set of variables into free and bound sets, and label all the different columns. (2) Decompose the bound set and create a new K-map for the decomposed bound set (utilizing minimum graph coloring, maximum clique, or some other algorithm to combine similar columns into a single column). Each call in the new K-map represents a labeled column in the K-map. (3) Encode the labels in the cells of the new K-map using minimum number of intermediate binary variables. These intermediate variables are shown as g and h in Example 3 (Figure 6). Express the intermediate variables as functions of the bound set variables.

Figure 4. Some RA decomposed structures

Boolean functions

939

Figure 5. All possible combinations of ð jÞ sub-functions f ðiÞ 1 ; f2 ; and f ðkÞ in Figure 4(a) 3 and ( b), respectively .

K 33,5/6

940 Figure 6. AC decomposition

(4) Produce the decomposed structure, i.e. a K-map specifying the function (F ) in terms of the intermediate variables and the free set variables. In general, steps (1) and (3) determine the optimality of the AC decomposition (i.e. whether the resulting decomposed blocks are of minimal complexity or not). Example 3. For the following logic function F ¼ x2 x3 þ x1 x3 þ x1 x2 ; let the sub-set of variable {x2,x3} be the Bound Set, and the sub-set of variable {x1} be the Free Set. The following is the disjoint AC decomposition of F (where {2 } means do not care). In Example 3, the first block of the decomposed structure has two outputs (intermediate variables g and h). The DFC measure of the decomposed structure is ¼ 2 £ 22 þ 1 £ 23 ¼ 16; while the DFC of the original LUT is ¼ 1 £ 23 ¼ 8: (This again shows the inadequacy of DFC as a measure of complexity because the decomposition produces a more complex structure than the non-decomposed LUT.) LF for the decomposed structure in Figure 6 is 8, which does not exceed the complexity of the LUT. However, since the decomposition does not reduce the complexity, for the purposes of this paper, the decomposition is not successful and thus rejected. This will be true whenever the first block of the decomposed function has two outputs. For other NPN functions, AC decomposition produces only one output in the first block. These decompositions are not rejected, and are shown in Figure 9. 2.4 RA: conventional RA versus modified RA RA is a decomposition technique for qualitative data (Conant, 1981; Klir, 1985, 1996; Krippendorff, 1986). A review with additional references is provided by Zwick (2001). RA data are typically either a set theoretic relation or mapping or it is a probability or frequency distribution. The former case is the domain of “set-theoretic” – or more precisely crisp possibilistic – RA. The latter is the domain of “information-theoretic” – or more precisely probabilistic – RA. The RA framework can apply to other types of data (e.g. fuzzy data) via generalized information theory (Klir and Wierman, 1998). RA decomposition can also be lossless or lossy. In this paper, we are concerned only with lossless decomposition, i.e. with decomposition which

produces no error. This paper introduces an innovation in set-theoretic RA, which we call “modified” RA (or MRA) (Al-Rabadi, 2001) as opposed to the conventional set-theoretic RA (or CRA). While CRA decomposes for all values of Boolean functions, MRA decomposes for an arbitrarily chosen value of the Boolean functions (e.g. for value “1”). The completely specified Boolean function can be retrieved if one knows the MRA decomposition for the Boolean function being equal either to “1” or to “0”. MRA and CRA are illustrated and compared in Example 4. Example 4. For the logic function: F ¼ x1 x2 þ x1 x3 ; Figure 7 shows the simplest model using both CRA and MRA decompositions. CRA decomposition (Conant, 1981; Zwick, 1995, Zwick and Shu, 1995) is illustrated in the upper half of the figure, while MRA decomposition (Al-Rabadi, 2001) is illustrated in the lower half of the figure. MRA decomposition yields a much simpler logic circuit than the corresponding CRA decomposition, while retaining complete information about the decomposed function. For CRA as shown in the top middle part of the figure, the calculated function for the model x1 x2 f 1 : x1 x3 f 2 : x2 x3 f 3 (i.e. a : b : g) is defined as follows: x1 x2 x3 F x1 x2 f 1 :x1 x3 f 2 :x2 x3 f 3 ; ðx1 x2 f 1 ^x3 Þ > ðx1 x3 f 2 ^x2 Þ > ðx2 x3 f 3 ^x1 Þ: (For lossless CRA decomposition, this equals the original function xlx2x3F that is shown at the top left of the figure; for lossy CRA x1 x2 x3 F x1 x2 f 1 :x1 x3 f 2 :x2 x3 f 3 would not be equivalent to x1x2x3F.) The CRA model can be interpreted by the circuit shown at the top right of the figure.

Boolean functions

941

Figure 7. Conventional versus modified RA decompositions for the Boolean function: F ¼ x1 x2 þ x1 x3

K 33,5/6

942

Table II. Steps in procedure to obtain MRA in Figure 7

MRA simplifies the decomposition problem by focusing, in the original function F, on the tuples for which F ¼ 1: (One could alternatively have selected the tuples for which F ¼ 0:) The procedure used to obtain the MRA in Figure 7 is as follows (Al-Rabadi, 2001). (1) Select the relation defined by (x1,x2,x3) tuples with value “1” (shaded in top left of Figure 7). (2) Obtain the simplest lossless CRA decomposition. (3) Assign value “1” to tuples in the resulting projections. Add all tuples that are missing in the projections and give them function value “0”. (4) Perform the intersection in the output block to obtain the total functionality (Table II).

Figure 8 shows the complexities of the decomposition of all NPN-classes of three-variable Boolean functions (Table I) using CRA decomposition and MRA decomposition, respectively. Figure 8 shows that in six NPN classes (classes 1, 2, 3, 6, 8, 9) MRA and CRA give equivalent complexity decompositions, but in the remaining four classes (classes 4, 5, 7, 10) MRA is superior in complexity reduction.

Boolean functions

943

3. Complexity of MRA versus AC decomposition Utilizing the methods described above, one obtains the following results in Figure 9 for the decomposition of three-variable NPN-classified Boolean functions (Table I) using MRA and AC decomposition. Figure 9 shows that in three NPN classes (4, 7, 9) MRA and AC give equivalent complexity decompositions. In three other classes (2, 3, 6), which

Figure 8. CRA versus MRA for the decomposition of all NPN-classes of three-variable Boolean functions

K 33,5/6

944

Figure 9. AC decomposition versus MRA decomposition for the decomposition of all NPN-classes of three-variable Boolean functions

encompass 42 functions, AC is superior, but in four classes (1, 5, 8, 10), which encompass 88 functions, MRA is superior. We can summarize these results by comparing the D versus ND for the various approaches. Figure 10 shows the number of classes and functions decomposable by one method, but not by Figure 10. Comparison of decomposability (D) versus non-decomposability (ND) for AC versus MRA (a), CRA versus AC (b), and CRA versus MRA (c), respectively

another (upper right and lower left cells). One concludes that for NPN-classified three-variable Boolean functions, MRA decomposition is superior to AC decomposition (88 versus 42), AC decomposition is superior to CRA decomposition (66 versus 32), and MRA decomposition is superior to CRA decomposition (80 versus 0). While the LF complexity measure that is used in Figure 9 is a good cost measure for machine learning, it is not a good measure for circuit design. An alternative cost measure for circuit design is the count of the total number of two-input gates (from Figure 3) in the final circuit (C#). Utilizing the resulting decompositions from Figure 9, Table III presents a comparison between MRA and AC for three-variable NPN classes of Boolean functions using the C# complexity measure. Table III shows that, using the C# cost measure, in five NPN classes (1, 2, 3, 6, 9) which encompass 66 logic functions AC is superior to MRA for both including and not including the cost of the inverters. For two NPN classes (4, 8), which encompass 72 logic functions, AC is equivalent to MRA for both including and not including the cost of the inverters. For two NPN classes

Boolean functions

945

Table III. Comparison of AC versus MRA using the C# cost measure. The number in parenthesis at the bottom of each table column is the sum of the number of functions in the shaded cells of that column

K 33,5/6

946

(5, 10), which encompass 56 logic functions, MRA is superior to AC for both including and not including the cost of the inverters. For one NPN class (7), which encompasses 24 logic functions, MRA is superior to AC when including the cost of the inverters, but the same as AC when inverters are not included. Thus, counting inverters, MRA is superior to AC (80 versus 66), while not counting inverters AC is superior to MRA (66 versus 56). The results of Table III are technology independent, that is, every logic function in Figure 3 is given the same cost. However, from a technology dependent point of view, the costs of the different logic functions of Figure 3 may not be the same, and the comparisons of Table III would have to be modified accordingly. 4. Conclusion A novel RA-based decomposition is introduced; MRA. MRA is compared to CRA and disjoint AC decomposition using the LF complexity measure which is a suitable measure for machine learning. It is shown that in three out of seven NPN classes while three-variable NPN-classified Boolean functions are not decomposable using CRA, they are decomposable using MRA. Also, it is shown that whenever a decomposition of three-variable NPN-classified Boolean functions exists in both MRA and CRA, MRA yields a simpler or equal complexity decomposition. While both the disjoint AC decomposition and MRA decompose some but not all NPN-classes, MRA decomposes more classes and consequently more Boolean functions than AC. For the purpose of circuit design, complexity can be defined by counting the total number of two-input gates. Using this measure, MRA is superior to AC when including the cost of the inverters and AC is superior to MRA when not including the cost of the inverters. Extensions of this MRA approach to reversible logic and quantum computing is presented by Al-Rabadi and Zwick (2002a); extensions to many-valued logic are presented by Al-Rabadi and Zwick (2002a, b). A comprehensive treatment of MRA with supplementary material is provided by Al-Rabadi (2002). Future work will include the investigation of the MRA decomposition of logic relation as opposed to functions, and multi-valued and fuzzy functions. The use of gates other than the logical AND gate (e.g. OR, XOR, NAND) at the final stage of RA-based decompositions to reduce the complexities of the decomposed structures will also be investigated. References Abu-Mostafa, Y. (1988), Complexity in Information Theory, Springer-Verlag, New York, NY. Al-Rabadi, A.N. (2001), “A novel reconstructability analysis for the decomposition of Boolean functions”, Technical Report #2001/005, Electrical and Computer Engineering Department, Portland State University, 1 July 2001, Portland, OR. Al-Rabadi, A.N. (2002), “Novel methods for reversible logic synthesis and their application to quantum computing”, PhD dissertation, Portland State University, Portland, OR.

Al-Rabadi, A.N. and Zwick, M. (2002a), “Reversible modified reconstructability analysis of Boolean circuits and its quantum computation”, Book of Abstracts of the WOSC-IIGSS 2002, Pittsburgh, PA, p. 90. Al-Rabadi, A.N. and Zwick, M. (2002b), “Modified reconstructability analysis for many-valued logic functions”, Book of Abstracts of the WOSC-IIGSS 2002, Pittsburgh, PA, p. 90. Ashenhurst, R.L. (1953), “The decomposition of switching functions”, Bell Laboratories’ Report, Vol. 1, pp. II-1-II-37. Ashenhurst, R.L. (1956), “The decomposition of switching functions”, Bell Laboratories’ Report, Vol. 16, pp. III-1-III-72. Ashenhurst, R.L. (1959), “The decomposition of switching function”, International Symposium on the Theory of Switching Functions, pp. 74-116. Conant, R. (1981), “Set-theoretic structural modeling”, Int. J. General Systems, Vol. 7, pp. 93-107. Curtis, H.A. (1962), A New Approach to the Design of Switching Circuits, Princeton, Van Nostrand, NJ. Curtis, H. (1963a), “Generalized tree circuit”, ACM, pp. 484-96. Curtis, H. (1963b), “Generalized tree circuit – the basic building block of an extended decomposition theory”, ACM, Vol. 10, pp. 562-81. Files, C.M. (2000), “A new functional decomposition method as applied to machine learning and VLSI layout”, PhD dissertation, Portland State University, Portland, Oregon. Grygiel, S. (2000), “Decomposition of relations as a new approach to constructive induction in machine learning and data mining”, PhD dissertation, Portland State University, Portland, OR. Jozwiak, L. (1995), “General decomposition and its use in digital circuit synthesis”, VLSI Design: An International Journal of Custom Chip Design Simulation. Klir, G. (1985), Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. (1996), “Reconstructability analysis bibliography”, Int. J. General Systems, Vol. 24, pp. 225-9. Klir, G. and Wierman, M.J. (1998), Uncertainty Based Information: Variables of Generalized Information Theory, Physica-Verlag, New York, NY. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Sage Publications, Thousand Oaks, CA. Muroga, S. (1979), Logic Design and Switching Theory, Wiley, New York, NY. Zwick, M. (1995), “Control uniqueness in reconstructibility analysis”, Int. J. General Systems, Vol. 23 No. 2. Zwick, M. (2001), “Wholes and parts in general systems methodology”, in Waganer, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY. Zwick, M. and Shu, H. (1995), “Set-theoretic reconstructability of elementary cellular automata”, Advances in System Science and Application, Special Issue, pp. 31-6. Further reading Shannon, C.E. and Weaver, W. (1949), A Mathematical Theory of Communication, University of IIIinois Press, IL.

Boolean functions

947

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

948

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Multi-level decomposition of probabilistic relations Stanislaw Grygiel Intel Corporation

Martin Zwick and Marek Perkowski Portland State University, Portland, Oregon, USA Keywords Cybernetics, Complexity theory, Probability calculations Abstract Two methods of decomposition of probabilistic relations are presented in this paper. They consist of splitting relations (blocks) into pairs of smaller blocks related to each other by new variables generated in such a way so as to minimize a cost function which depends on the size and structure of the result. The decomposition is repeated iteratively until a stopping criterion is met. Topology and contents of the resulting structure develop dynamically in the decomposition process and reflect relationships hidden in the data.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 948-961 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410533994

1. Introduction There exist two main approaches to the analysis of complex systems: probabilistic and non-probabilistic. Probabilistic approach assumes a knowledge of probability distribution over the variables of the system and the decomposition consists of determination of a set of simplest possible marginal probabilities. Non-probabilistic approach requires specification of the global relation over the variables of the system and the decomposition consists of determination of a set of simplest possible projected relations describing the system. Here a system is described by a contingency table. Each cell of the table contains the frequency observed for a particular combination of variable values. These frequencies can be normalized to the total number of observations and used to approximate the true probability distribution over the variables of the system. The system is referred to as a probabilistic system. In many situations it may be impossible or unreasonable to collect frequency information which is statistically reliable, but it is relatively easy to collect meaningful information on the set-theoretic relation which exists between variables of the system. This corresponds to the situation where the cell frequency is either 0 or 1. This approach is also justified if cells of the contingency table contain only two distinct values of frequency (or values that are close to two distinct values) which may be assigned to two classes 0 or 1. In such situations the system is referred to as non-probabilistic. Systems are also characterized as being either directed or neutral. In directed systems, variables are distinguished as being either independent variables (inputs) or dependent variables (outputs); in neutral systems no such distinction is made.

The decomposition of a complex system into an organized set of subsystems is motivated by the belief that a simpler decomposed structure will better describe unobserved data (Occam razor principle) and will make it easier to understand relationships hidden in the data. Each subsystem can be viewed as defining a certain concept and the whole structure can be viewed as a higher level relation expressed in terms of these concepts (variables). In this paper, both probabilistic and non-probabilistic approaches will be considered and a new method of their decomposition will be presented. This paper is organized as follows. Section 2 presents the related work, Section 3 presents the decomposition algorithms, Section 4 presents the cost function used in this paper, Section 5 discusses results and Section 6 concludes the paper. 2. Related work The decomposition of complex systems was analyzed by many researchers in the past. In the terminology of systems science both decomposition and composition are known under the name of reconstructability analysis ( RA) ( Klir, 1985). The approach presented by Ashby (1965), Conant (1972), Klir (1976) and Krippendorff (1979) consists of generating a lattice of possible decomposition structures and evaluating them in terms of both complexity and accuracy using either a set-theoretic (non-probabilistic) or an information-theoretic ( probabilistic) approach. Both approaches are based on uncertainty measures, the first on Hartley’s (1928) entropy and the second on Shannon’s entropy ( Shannon and Weaver, 1975). A structure that results in the smallest complexity and yet describes the data with a high accuracy is selected to be the best solution. An overview of decomposition approaches developed within the framework of general systems methodology (RA) is presented by Zwick (2001) and an extended bibliography of RA as a whole is given by Klir (1996). RA of directed systems was further clarified by Zwick (1995a), and some additional details on set-theoretic RA are presented by Conant (1981) and Zwick (1995b). In standard ( both set-theoretic and information-theoretic) approaches to RA, the number of system variables remains unchanged in the process of decomposition. By contrast, the methods presented in this paper, which are based on ideas used in the decomposition of binary functions, introduce new variables in the decomposition process to reduce complexity. These methods, while inherently non-probabilistic in nature (Grygiel, 2000), can be applied also as approximate techniques for probabilistic systems. 3. Decomposition Decomposition of relation consists, in general, of splitting a larger relational block into a number of smaller, possibly interrelated, blocks (Figure 1).

Multi-level decomposition

949

K 33,5/6

950

We will focus in this paper on decomposing one block into two smaller blocks in such a way as to reduce a certain cost measure. This process can be iteratively repeated until termination criterion is satisfied. The cost measure will be discussed in more details in Section 4. In Figure 1, X denotes a set of independent and Y a set of dependent variables of the relation. In the decomposed structure, if X 1 > X 2 – Y then decomposition will be called non-disjoint, otherwise we will call it disjoint. In the most general case, both R1 and R2 have both dependent and independent variables. It is also possible that dependent variables of one block are independent in another block, for instance it may be Y 1 > X 2 – Y: The presentation of the decomposition algorithms in this paper will be based on tabular representation of relations (contingency tables). The software implementation of the algorithm, however, uses lr-partition representation which is more memory efficient for manipulation of large multiple-valued relations (Grygiel and Perkowski, 1998; Grygiel et al., 1997). Other notations used in this paper are as follows. Upper case characters will denote sets and lower case will denote variables. jXj is the cardinality of the set X and jxj is the cardinality of the variable x (number of values the variable x can take). A relational/functional block with a set X of independent variables and set Y of dependent variables will be denoted by (X, Y). 3.1 Relations The following definition of relation will be used in this paper: Definition 1 (relation). Let S ¼ fS i g be a set of sets Si. A subset R of the Cartesian product S 1 £ S 2 £ . . . £ S k will be called a k-ary relation. A Such a defined relation can always be represented by a two-dimensional contingency table based on the fact that the Cartesian product operation is associative and Cartesian product is a set so we can reduce a k-ary relation to a binary relation R # S a £ S b where Sa and Sb are sets of n-ary and m-ary tuples, respectively, and n þ m ¼ k: Cells of the contingency table representing relation can either contain 0s and 1s or any numbers. The first case corresponds to non-probabilistic relations, 1s and 0s denote tuples which are and are not contained in a given

Figure 1. Decomposition

relation. The second case corresponds to probabilistic relations, and numbers represent probabilities or frequencies associated with the corresponding tuples. 3.2 Decomposition type I This type of decomposition is always non-disjoint, i.e. the sets of independent variables of the decomposed blocks are non-disjoint. Let X ¼ fxi g; i ¼ 1; . . .; n; be a set of variables, X1, X2 be a partition of X, and Qxi be a set of values the variable xi can take. If R is a relation based on the set of variables X then R # Qx1 £ . . . £ Qxn ¼ QX 1 £ QX 2 ; where QX j ¼ fqkX j g; qkX j is a tuple (combination of values) for the set Xj, and QX j is a set of all tuples for the set Xj. Such defined relation R can be represented by a contingency table of jQX 1 j columns and jQX 2 j rows, each column corresponding to a different tuple qkX 1 [ QX 1 and each row to a different tuple qkX 2 [ QX 2 : Each cell of the contingency table contains 1 if the corresponding combination of tuples qiX 1 ; qjX 2 belongs to the relation and 0 if it does not. Definition 2 (column multiplicity). Column multiplicity m is a number equal to the number of distinct columns in the contingency table. A Definition 3 (row multiplicity). Row multiplicity m is a number equal to the number of distinct rows in the contingency table. A Column multiplicity m is greater than or equal to 1 (it is equal to 1 if all the columns are identical) and less than or equal to jQX 1 j (all the columns are different). Our goal is to decompose the original relation R into two sub-relations R1 and R2. Let us create a new variable a such that jaj ¼ m and label each of the m sets of identical columns with a different value aj of variable a. Let R1 # QX 1 £ Qfag be a relation created by extending every tuple qiX 1 [ QX 1 with the value aj of variable a assigned to the column qiX 1 so that R1 ¼ fqiX 1 aj g: To achieve our goal, R2 has to be created in such a way that the composition of R1 and R2 results in R. The process of creation of R2 will be defined by the following theorem. Theorem 1 (decomposition). Relation R2 # QX 2 £ Qfag meeting the above conditions can be represented by a contingency table created from the original table for relation R by combining the identical columns of the table. The new columns will correspond to the tuples qjA [ QA : Proof. It is enough to show that for every pair of tuples qiX 1 ak [ R1 and qjX 2 ak [ R2 ; the pair of tuples qiX 1 qjX 2 is part of the relation R ðqiX 1 qjX 2 [ RÞ: Let us assume that there exists a pair of tuples qiX 1 ak [ R1 and qjX 2 ak [ R2 ; such that qiX 1 qjX 2  R: The condition qiX 1 qjX 2  R means that the intersection of column qiX 1 and row qjX 2 in the original contingency table contains 0. The condition qjX 2 ak [ R2 means that the intersection of row qX 2 and column ak in the contingency table corresponding to R2 contains 1. By the construction of R1,

Multi-level decomposition

951

K 33,5/6

952

Figure 2. Decomposition type I

column ak corresponds to the set of identical columns containing column qiX 1 : Hence, by the condition qjX 2 ak [ R2 ; intersection of the row qjX 2 and co1umn qiX 1 contains 1 which is in contradiction with the assumption qiX 1 qjX 2  R: This completes the proof. A Figure 2 shows the process of decomposition of a relation. Relation R is represented by tables in Figure 2(a) and (b). A cell in the table in Figure 2(b) contains 1 if the corresponding tuple belongs to the relation and 0 otherwise. The column multiplicity index of the table in Figure 2(b) is equal to 2 and so is the cardinality of the new variable a. The table in Figure 2(c) corresponds to the block R1 in Figure 2(e), a cell of the table contains 1 if the corresponding combination of variable values exist in the table in Figure 2(b). For instance, the columns x3 x4 ¼ 00; x3 x4 ¼ 01 in Figure 2(b) are labeled with a ¼ 0 and x3 x4 ¼ 10; x3 x4 ¼ 11 with a ¼ 1 so the cells corresponding to these combinations of values will contain 1 in table R1 in Figure 2(c). Other combinations of values of x3 x4 and a will yield 0 in the table R1.

The table in Figure 2(d) which corresponds to the block R2 in Figure 2(e) is created from the table in Figure 2(b) by combining identical columns and replacing variables x3 x4 with a new variable a. The same decomposition method can be used to decompose probabilistic relations, i.e. relations with probability or frequency associated with each tuple. For this kind of relations however, the probabilities have to be discretized before decomposition can be performed. The most often used discretization method, uniform binning, divides the space of each variable value into a number of equally sized bins. Another type of discretization methods are methods based on the entropy measure (Catlett, 199l; Fayyad and Irani, 1993) which use minimum entropy criterion to assign values to different bins. They often yield better results. Figure 3 shows the decomposition process of such a relation.

Multi-level decomposition

953

3.3 Decomposition type II The decomposition described in Section 3.2 resulted always in a non-disjoint solution. In this section, we will describe a decomposition which may result in either disjoint or non-disjoint solutions. The main distinction of the decomposition described in this section is that it is a functional decomposition. We decompose not the relation itself, but the probability

Figure 3. Decomposition type I, probabilistic relation

K 33,5/6

954

Figure 4. Decomposition type II

density function defined by the frequencies or probabilities in the contingency table describing the relation. The result of the decomposition can again be viewed as a neutral relation. The type II decomposition procedure is shown in Figure 4. The relation used in this example is the same as the one shown in Figure 3. The relation to be decomposed is shown in Figure 4(a) and (b). The result of uniform binning to five values is shown in Figure 4(c). The decomposition alone is performed in the manner similar to the decomposition described in Section 3.2. The difference between the two is the way the block R1 is created. For decomposition of type I, the new variable a is always an independent variable in R1. For decomposition of type II described in this section, the new variable a is always a dependent variable in R1. In block R2, variable a is independent in both type I and type II decompositions. The decomposition in Figure 4 is a disjoint decomposition because the sets of independent variables of blocks R1 and R2 are disjoint. In Figure 5, we show the type II non-disjoint decomposition procedure. Since for the type II decomposition the extra variable a cannot be shared between R1 and R2 the only way to achieve non-disjoint decomposition is to share some of the independent variables from the set X.

Multi-level decomposition

955

Figure 5. Non-disjoint decomposition type II

The disjoint decomposition with X 1 ¼ fx1 x2 g and X 2 ¼ fx3 x4 g in Figure 5 leads to jaj ¼ 4 and does not simplify the original structure. Selecting non-disjoint sets X 1 ¼ fx1 ; x2 g; X 2 ¼ fx2 ; x3 ; x4 g leads to the table in Figure 5(c). Some of the cells in this table correspond to impossible combinations of variable values for instance, a variable taking values 0 and 1 at the same time. These cells are denoted by “-” and correspond to structural zeroes as defined by Krippendorff (1986). Since structural zeroes correspond to impossible observations we can replace them with any values for the sake of column multiplicity computations. Selecting the values as in Figure 5(d) results in column multiplicity equal to 2. This value is smaller than the value of column multiplicity of the table in Figure 5(b) corresponding to the non-disjoint case. Relations R1 and R2 can be determined the same way as for the disjoint case. The results of the decomposition are shown in Figure 5(e)-(g).

K 33,5/6

The same procedure can also be used in type I decomposition to increase the number of shared variables if needed.

956

4. Cost measure: cardinality The cost measure used in this paper is based on the measure proposed by Abu-Mostafa (1988). He defined complexity of a binary function (functional block) as a number of tuples describing it: C ¼ 2jXj jY j

ð1Þ

where X and Y are sets of independent and dependent variables, respectively. The cost of a combination of functional blocks was defined as a sum of costs of particular blocks. According to his definition the number of tuples, which is determined by the set of independent variables X, is multiplied by the number of dependent variables. This is due to the fact that each dependent variable corresponds to a separate function defined on the same set of independent variables. We will extend Abu-Mostafa’s definition to the multiple-valued case and call it cardinality (Grygiel, 2000): Y Y C¼ ð2Þ xi log2 yi xi [X

yi [Y

Q Q where xi [X xi is the number of tuples and log2 yi [Y yi is a normalized number of binary variables, i.e. an equivalent number of binary variables corresponding to the set Y of multiple-valued variables. If there are no dependent variables we will assume: Y C¼ ð3Þ xi xi [X

This is justified by the fact that every neutral relation (only independent variables present) can be always transformed to a function with one binary dependent variable which takes value 1 if the corresponding combination of values of independent variables belongs to the relation,Q and takes value 0 if it does not. For one binary variable y, the expression log2 yi [Y yi in equation (2) is equal to 1 and equation (3) follows. Let us consider decomposition of the block ðX; Y Þ into blocks ðX 1 ; Y 1 Þ and ðY 1 < X 2 ; Y Þ: The complexity of the decomposed structure is equal to: C ¼ pX 1 log2 pY 1 þ pY 1 pX 2 log2 pY where

ð4Þ

Multi-level decomposition

X1 < X2 ¼ X Y xi pX 1 ¼ xi [X 1

pX 2 ¼

Y

xi

957

xi [X 2

pY 1

Y ¼ yi yi [Y 1

pY ¼

Y

yi

yi [y

Comparing equation (4) to the complexity of the original structure we can easily show that in order to achieve complexity reduction the following necessary condition must be true: pY 1 , pX 1

ð5Þ

in fact, if pY 1 $ pX 1 then: pX 1 log2 pY 1 þ pY 1 pX 2 log2 pY $ pX 1 log2 pX 1 þ pX 1 pX 2 log2 pY $ pX 1 pX 2 log2 pY ¼ pX log2 pY and decomposition increases, instead of decreasing, the complexity of the structure. 5. Results In this section, we will present type II decomposition of a small real life example ( Ries-Smith data) ( Ries and Smith, 1963) and compare the complexities of few simple decomposition examples presented in the previous sections. Figure 6(a) shows the contingency table of the Ries-Smith data. We have four independent variables here, each combination of variable values is associated with a frequency in the table in Figure 6(a). In Figure 6(b), the result of uniform binning into three equally sized bins is shown. We performed uniform binning for the number of bins ranging from 2 to 10, but only for the three bins case our program was able to find a decomposition. The decomposed structure is shown in Figure 6(c). In the decomposition process (disjoint decomposition of type II) three blocks were extracted from the original data and two new variables price a0 and a00 added. The tables in Figure 7 describe relations between variables in these three blocks. The complexity of original structure in Figure 6(b) is equal to C ¼ jx1 jjx2 jjx3 jjx4 j log2 j f 0 j ¼ 3 £ 2 £ 2 £ 2 £ log2 3 ¼ 38:04; the decomposed

K 33,5/6

958

Figure 6. Ries-Smith data

Figure 7. Ries-Smith data: decomposed blocks

structure complexity is smaller and equal to C II ¼ jx2 jjx4 j log2 ja0 j þjx1 jja0 j log2 ja00 j þ jx3 jja00 j log2 j f 0 j ¼ 2 £ 2 £ log2 3 þ 3 £ 3 £ log2 3 þ 2 £ 3 £ log2 3 ¼ 30:11; which makes for 21 percent complexity reduction. Table I summarizes complexity gains for different structures discussed in this paper. In Table I, d and nd denote disjoint and non-disjoint decompositions, p and np denote probabilistic and non-probabilistic relations, C 0 is the complexity of the initial structure, C 00 is the complexity of the structure after decomposition

and equations (2)-(4) were used to calculate complexity. Remember also that Figures 3-5 show decomposition of the same relation but using different decomposition methods. As we can see in Table I for the decomposition in Figure 2 complexity of the decomposed structure is the same as that of the initial structure. However, if we count the number of tuples in the original and decomposed structures we will obtain 10 and 9, respectively (10 percent). This means that the complexities calculated using equations (2)-(4) are equal to the maximum number of tuples that can used to describe a given structure. The real complexity (number of tuples) can be in fact smaller. In other words, the values of C 0 and C 00 in Table I compare the size of the state space of the original table (the binned table, if the data are probabilistic) and the sum of the sizes of the state spaces of the tables which the decomposition gives. The second observation we can make is that non-disjoint decompositions usually result in higher complexity structure than the disjoint ones ( Figures 3 and 5 for non-disjoint and Figure 4 for disjoint decompositions). The disjoint decomposition however, is harder to find and the non-disjoint one may be the best we can obtain. The non-disjoint decomposition may not exist either, but the chance of finding it is higher than for disjoint one.

Multi-level decomposition

959

6. Summary In this paper, we presented two types of decomposition that can be used for decomposing both probabilistic and non-probabilistic relations. The decomposition of type I can be applied to relations directly and leads to non-disjoint decomposed structures. The type II decomposition is a functional decomposition so we apply it to probability density function or frequency distribution specified for a given relation. Both decompositions act on discrete values only, if they are applied to probabilistic relations the continuous values of probabilities or frequencies have to be discretized before decomposition. All decompositions are “lossless” in the sense that they yield the binned table exactly (but they are not lossless relative to original tables with unbinned frequencies). No analysis has yet been done on the loss of information which occurs when frequencies are binned. The decomposition process is driven by a cost function which assures that the decomposed structure is of lower complexity than the decomposed one.

Type Figure Figure Figure Figure Figure

2 3 4 5 6

I I II II II

d d d nd nd

np p p p p

C0

C 00

Drop (percent)

16.00 37.15 37.15 37.15 38.04

16.00 26.58 22.58 26.58 30.11

0 28 39 28 21

Table I. Complexity drop

K 33,5/6

960

The cost function used in this paper (cardinality) defines a relation’s complexity as number of its tuples. Other cost functions could be used as well (Grygiel, 2000), but more detailed discussion of this subject is beyond the scope of this paper. We also presented few simple decomposition examples to illustrate the algorithms used. In one of the examples we decomposed a small real life data set ( Ries-Smith data) running our software implementation of type II decomposition method. The data values were discretized using uniform binning method and decomposed iteratively into three blocks with two independent variables each. Summarizing, we think that the methods of decomposition presented in this paper can serve as a useful alternative to the uncertainty based methods most often used for the decomposition of probabilistic relations.

References Abu-Mostafa, Y. (1988), Complexity in Information Theory, Springer-Verlag, New York, NY. Ashby, W. (1965), “Measuring the internal informational exchange in a system”, Cybernetica, Vol. 1 No. 8, pp. 5-22. Catlett, J. (1991), “On changing continuous attributes into ordered discrete attributes”, in Kodratoff, Y. (Ed.), Proceedings of the European Working Session on Learning, Springer-Verlag, Berlin, Germany, pp. 164-78. Conant, R. (1972), “Detecting subsystems of a complex system”, IEEE Transactions on Systems, Man, and Cybernetics, pp. 374-7. Conant, R. (1981), “Set-theoretic structure modeling”, Int. J. General Systems, Vol. 7 No. 38, pp. 93-107. Fayyad, U. and Irani, K. (1993), “Multi-interval discretization of continuous-valued attributes for classification learning”, Proceedings of the 13th International Joint Conference on Artificial Intelligence, Morgan-Kaufmann, Los Altos, CA, pp. 1022-7. Grygiel, S. (2000) “Decomposition of relations as a new approach to constructive induction in machine learning and data mining”, PhD dissertation, Portland State University, Portland, OR. Grygiel, S. and Perkowski, M. (1998), “New compact representation of multiple-valued functions, relations, and non-deterministic state machines”, ICCD-98, Austin, TX. Grygiel, S., Perkowski, M., Marek-Sadowska, M., Luba, T. and Jozwiak, L. (1997), “Cube diagram bundles: a new representation of strongly unspecified multiple-valued functions and relations”, Proceedings of ISMVL’97, Halifax, Nova Scotia, Canada, pp. 287-92. Hartley, R. (1928), “Transmission of information”, The Bell Systems Technical Journal, Vol. 7 No. 3. Klir, G. (1976), “Identification of generative structures in empirical data”, Int. J. General Systems, No. 3, pp. 89-104. Klir, G. (1985), Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. (1996), “Reconstructability analysis bibliography”, Int. J. General Systems, Vol. 24, pp. 225-9.

Krippendorff, K. (1979), “On the identification of structures in multivariate data by the spectral analysis of relations”, Proceedings of 23th Annual Meeting of SGSiL, Houston, TX. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Sage Publications, Thousand Oaks, CA. Ries, P. and Smith, H. (1963), “The use of chi-square for preference testing in multidimensional problems”, Chem. Eng. Progress, Vol. 59, pp. 39-43. Shannon, C. and Weaver, W. (1975), The Mathematical Theory of Communication, University of Illinois Press, IL (first published in 1949). Zwick, M. (1995a), “Control uniqueness in reconstructability analysis”, Int. J. General Systems, Vol. 23 No. 2. Zwick, M. (1995b), “Set-theoretic reconstructability of elementary cellular automata”, Advances in System Science and Application, No. 1, pp. 31-6. Zwick, M. (2001), “Whole and parts in general systems methodology”, in Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, London.

Multi-level decomposition

961

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

962

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

The k-systems glitch: granulation of predictor variables Susanne S. Hoeppner and Gary P. Shaffer Department of Biological Sciences, Southeastern Louisiana University, Hammond, Louisiana, USA Keywords Cybernetics, Cluster analysis, Data reduction, Dynamics Abstract Ecosystem behavior is complex and may be controlled by many factors that change in space and time. Consequently, when exploring system functions such as ecosystem “health”, scientists often measure dozens of variables and attempt to model the behavior of interest using combinations of variables and their potential interactions. This methodology, using parametric or nonparametric models, is often flawed because ecosystems are controlled by events, not variables, and events are comprised of (often tiny) pieces of variable combinations (states and substates). Most events are controlled by relatively few variables (#4) that may be modulated by several others, thereby creating event distributions rather than point estimates. These event distributions may be thought of as comprising a set of fuzzy rules that could be used to drive simulation models. The problem with traditional approaches to modeling is that predictor variables are dealt with in total, except for interactions, which themselves must be static. In reality, the “low” piece of one variable may influence a particular event differently than another, depending on how pieces of other variables are shaping the event, as demonstrated by the k-systems state model of algal productivity. A swamp restoration example is used to demonstrate the changing faces of predictor variables with respect to influence on the system function, depending on particular states. The k-systems analysis can be useful in finding potent events, even when region size is very small. However, small region sizes are the result of using many variables and/or many states and substates, which creates a high probability of extracting falsely-potent events by chance alone. Furthermore, current methods of granulating predictor variables are inappropriate because the information in the predictor variables rather than that of the system function is used to form clusters. What is needed is an iterative algorithm that granulates the predictor variables based on the information in the system function. In most ecological scenarios, few predictor variables could be granulated to two or three categories with little loss of predictive potential.

Introduction The pathway to useful knowledge is not through the accumulation of information. Rather, useful knowledge is gained through extraction of meaningful and structured pieces of information from an overwhelming pool of entropy ( Norretranders, 1998). The Earth and all of the ecosystems it contains, is shaped by such a vast complexity of interactions that only the Earth, in its entirety, can describe them in sufficient detail. It is useless, however, to use the territory for the map. Kybernetes Vol. 33 No. 5/6, 2004 pp. 962-972 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534001

The authors thank Rick Miller, Bettina Hoeppner, Frank Campo, and Heath Benard for their thoughtful comments on previous versions of the manuscript, as well as Martin Zwick for insightful discussions and for encouraging this effort.

Consider something as simple as attempting to predict the growth of a single cypress tree inhabiting a Louisiana swamp. Armed with a photosystem (for measuring photosynthesis) and other sophisticated gadgetry, one might measure variables such as salinity, nutrient concentrations in the soil, flooding depth and duration, light intensity, herbivory, and competition. In addition, other variables such as gravity, sound disturbance, microorganism density in the soil, wind intensity, and toxins in every medium that the tree encounters (air, water, soil) could influence photosynthesis (Figure 1). Out of the dozens of variables that our technology enables us to measure, three major categories can be distinguished. The main variables of concern are the controlling variables, which are factors that directly and strongly influence system behavior. The effect of these controlling variables may be shaped substantially by modulating variables. At any given point in the space-time continuum, most variables could be described as ineffectual, that is, variables that have a negligible effect on the system function. Often the most difficult and time-consuming aspect of data analysis involves isolating and discarding these ineffectual variables from the analysis. This kind of data reduction can be accomplished through the use of factor analysis to combine collinear variables and multiple regression to isolate complex, but static, interactions. The remaining variables and variates, generally less than ten in even the most complex systems, can then be subjected to k-systems analysis to further isolate the factors that contain high effect sizes. Matters are further complicated by variables that can change from controlling, to modulating and even ineffectual and back, depending on contextual conditions. Also, a controlling or modulating variable may switch from being positively to negatively related to the system function, or even appear unrelated, as shown in Figure 2.

The k-systems glitch

963

Figure 1. The suite of variables in air, water, soil and other constituents of the environment that potentially impact the growth of a typical cypress tree

K 33,5/6

964 Figure 2. Hourly productivity of benthic algae (solid line) and phytoplankton (dashed line) in a Louisiana estuary

In this example of algal growth, phytoplankton productivity (mg C/m2 h) defines the system function and benthic algal productivity is the primary predictor variable. Owing to the changing nature of this relationship, the Pearson product-moment correlation is effectively zero because this parametric method computes pairwise comparisons across the entire data set. Recognizing this dilemma, Shaffer and Sullivan (1988) applied k-systems analysis and extracted a meaningful interpretation of the nature of the relationship between these two variables, which are shaped individually by as many as six variables, but whose interaction with one another is defined by the modulating effect of water velocity. Even in a relatively simple system such as algal productivity, it becomes apparent that the dynamic effects of the major variables cannot be captured crisply across an entire data set, but rather through the analysis and isolation of events within the data set. An event is defined as the minimum combination of variables at any point in space or time that account for a measurable or substantial response in the behavior of the system function. It is important to note that events are not independent. As each variable gradually becomes more or less influential, the dynamically interacting combinations of these variables, that define events, also have gradual effects on the system behavior. This phenomenon causes events to merge into one another and makes it difficult or even impossible to define strict boundaries of a particular event. Therefore, we believe that event effects most likely have Gaussian distributions with a most probable effect and tapering tails determined by genetic and environmental variability (Figure 3). The variances of these distributions can be changed by the number of variables and the number of variable states that are included in the analysis. Adding variables and or increasing variable states can decrease the dispersion of event distributions. Even though narrower event distributions

The k-systems glitch

965

Figure 3. Effect of adding more variables or finer granulation within variables on Gaussian event distributions

seem desirable, one must also consider the implications of specifying and potentially over-specifying the number of variables and states. Highly kurtotic events require the use of many conditional statements, defeating the purpose of attempts to summarize. In this case, only the complete description of the observation is its own explanation. So, when is an event optimally meaningful? The answer lies in the effect size that a variable has on system behavior, the degree of granulation of that variable, and knowing when to stop. The purpose of this paper is (1) to suggest that only few variables can control system behavior at a particular point in time and space; (2) to show that only a few (three or fewer) levels are necessary to capture information of the metric predictor variables; and (3) to demonstrate that granulation of metric variables using the response of the system function carries more information than traditional clustering procedures. An application scenario It is useful to set-up an example scenario to demonstrate how this process of optimization might work. We base our example on a data set collected to determine the feasibility of re-plumbing a portion of the Mississippi River back into 160 km2 of degrading swamps surrounding Lake Maurepas, located in

K 33,5/6

966

southeastern Louisiana, USA. Photosynthetic response of a sample of cypress trees is used as an indicator of ecosystem health, and predictor variables include salinity, frequency and duration of flooding, soil strength, herbivory, 23 soil and water chemicals, pH, and redox potential. As in many ecological data sets, the bulk of these variables tested out as collinear surrogates or as non-influential. In addition, some of the variables are positively, negatively, and completely unrelated to the system function (as in Figure 2), depending on the influences of other variable combinations (states). To reduce the level of complexity in system behavior, we use the photosynthetic response of ten individual trees in specific environmental settings. In this constructed data set, only four (Table I) of the myriad of possible and measured variables, which include the 30 variables mentioned above, are necessary to demonstrate the effectiveness of the proposed granulation methodology. Factor analysis was useful in grouping the highly collinear chemical variables into a few variates, only two of which, nitrogen and phosphorous, carried information with regard to photosynthesis. Other variables, such as Case 1 2 3 4 5 6 7 8 9 10 SA

Table I. The effects of salinity, nitrogen (N), phosphorous (P), and flood duration (Flood) on the photosynthetic rates (PS) of baldcypress seedlings

Salinity

TC

2SG

3SG

N

2SG

P

2SG

0 4 15 14 2 9 3 5 11 0

L L H H L H L L H L

L H H H L H L H H L

L M H H L H L M H L

13 32 10 44 38 45 20 50 40 33

L H L H H H L H H H

3 5 3 6 1 6 5 7 4 8

H H H H L H H H H H

Case Flood 2SG PS KS Flat R1 R2 R3 1 700 L 80 0.2273 0.1 0.151 0.188 0.188 2 2,200 H 28 0.0796 0.1 0.151 0.041 0.077 3 460 L 0 0.0028 0.1 0.023 0.041 0.023 4 110 L 6 0.0171 0.1 0.023 0.041 0.023 5 340 L 20 0.0568 0.1 0.151 0.188 0.188 6 610 L 4 0.0114 0.1 0.023 0.041 0.023 7 1,930 H 50 0.2017 0.1 0.151 0.188 0.188 8 770 L 26 0.0739 0.1 0.151 0.041 0.077 9 320 L 12 0.0625 0.1 0.023 0.041 0.023 10 260 L 94 0.2671 0.1 0.151 0.188 0.188 SA 28.2 50.2 59.7 Notes: Clustering using traditional methods (TC), a 2- (2SG) and a 3-cluster (3SG) system-based granulation results in three different reconstructions (R1, R2, R3, respectively). Reconstructions can be compared to the k-system (KS) and a flat system (Flat). System accuracy (SA) is computed for the three reconstructions

redox potential and pH, proved to be inapplicable, one due to near chaotic variation and the other due to virtually no effective variation. Of the four selected variables in Table I, salinity and nitrogen were the major controlling variables, phosphorous was mostly ineffectual except when limiting (case 5), and flooding was a modulating variable. In this simplified system, photosynthetic rates were generally high when salinity was less than 4 ppt, nitrogen was greater than 32 ml/l, phosphorous was greater than 2 ml/l, and flooding was less than 1,500 h/year. Detrimental conditions included salinities greater than 8 ppt and flooding greater than 2,800 h/year, while nitrogen and phosphorous were proportionally limiting. If salinity was greater than 4 ppt and either of the nutrients was very high, then they produced a synergistically negative effect on photosynthesis, because the tree root system experiences osmotic stress due to high ion content. For example, in case 7 (Table I) salinity was at acceptable levels, nitrogen was slightly limiting, phosphorus was abundant, and flooding was stressful, overall causing photosynthetic rates to decrease to roughly 50 percent of the optimum. After sifting the variables down to a meaningful subset, the next step involves clustering the metric predictor variables (Jones, 1985, 1986). The gaps between the observations are sufficiently large (Table I) to lead any traditional clustering methods, such as Ward’s method or average linkage clustering (Hair et al., 1998; Tabachnick and Fidell, 2001), to form identical cut points. What is needed, however, is a clustering method that bases its cut points on the system function (Shaffer, 1997). In Table I, we compare a k-systems granulation method with traditional clustering and demonstrate that this type of binning necessarily improves the reconstruction. For simplicity sake, all reconstruction solutions presented are based on cluster alterations of salinity only. The first reconstruction solution (R1) is based on a traditional clustering, while the subsequent reconstructions (R2 and R3) are based on our suggested clustering technique with two and three granulation levels, respectively. Following granulation, k-systems analysis proceeds by converting the raw data (the g-system, Jones, 1986) to k-system proportions and then reconstructing the k-system from the flat system (Table I) using the minimal number of states and substates ( Jones, 1986). Reconstructions are computed as simple averages of the within-cluster proportions. The resolution of the reconstructions were evaluated using least-square computations as well as the system accuracy formula described by Jones (1986), so that: h X .X i System accuracy ¼ 100 £ 1 2 jlog2 ð f i =f 0I Þj jlog2 ð f i =f 00I Þj where fi denotes the proportionalized actual distribution (k-system), f I0 describes 00 the reconstruction, and f I is the flat system. The least-square solutions showed that the error sums of squares were reduced by 29.12 percent between the traditional solution and the k-system granulation with two clusters. An additional 12.40 percent error reduction occurred for the three-cluster k-system

The k-systems glitch

967

K 33,5/6

968

granulation. Evaluating these same reconstructions using system accuracy, the traditional clustering method resulted in a reconstruction of 28.2 percent of the total information contained in the system function. The accuracy improved considerably to account for 50.2 percent of the total variation in system function when the proposed granulation method with two clusters was used and 59.7 percent for the proposed granulation with three clusters. Thus, reconstruction accuracy is improved by both switching to a system function based granulation method and finer granulation of predictor variables, where appropriate. The question arises how, exactly, the two clustering methods used differ. In this scenario, any traditional clustering algorithm will cluster salinity cases 2 and 8 as “low” because no gap exists between them and the other cases classified as “low”. Ecologically, however, these cases should be clustered as “high” because these salinities are known to decrease the growth rates of cypress (Conner et al., 1997). Therefore, it is not surprising that a reconstruction based on system-function granulation better matches the ecology and explains an additional 22 percent of system behavior. A potential problem with this granulation method that does not exist with traditional methods, however, is that the particular responses of the system function may not coincide with other cases in the same cluster. An ecologically real outlier, such as case 9 with a relatively “high” photosynthetic rate – despite the detrimental salinity it is exposed to in our example scenario – may negatively impact the granulation of the predictor variable. An argument for parsimony Considering that system reconstruction accuracy increases when the predictor variable is more finely granulated, the question arises if this process can be continued indefinitely. The danger with increasing the number of clusters within a particular variable is that it propagates exponentially across all predictor variables, resulting in decreased region sizes of particular events, and may lead to many missing cells. Each additional level in one variable creates a myriad of new, often impossible (or not naturally occurring) combinations with all other states and substates of other variables. The resulting missing cells have customarily been replaced with the value from the flat system (zero information), thus rendering the reconstruction accuracy mushy. So, how many granulation levels are sensible? This depends on the nature of the relationship between the predictor variable and the system response measured, interactions of the predictor variable with other variables, and the observed or possible range of the predictor variable in the system of interest. Most variable effects used in biology can be characterized as shown in Figure 4. Even though the changes in each relationship are gradual, it becomes apparent (Figure 4) that rarely, if ever, more than three cut levels are needed for any predictor variable to describe the majority of its influence on the system response, and mostly two cut levels are sufficient.

The k-systems glitch

969

Figure 4. Common system function responses (Yi) to different predictor variables (Xi) found in biology

A similar application of parsimony is useful in the selection of the number of predictor variables entered into the analysis. An increase in predictor variables increases and stabilizes the system reconstruction for some events ( Figure 3), while at the same time many missing cells are created at an exponential rate for states and substates that are unlikely or impossible to occur in nature. The mining of potent variables involves a combination of parametric and reconstruction procedures, as discussed above. Even if all possible variables were measured and entered into the model, some residual variation due to genetic variability, phenotypic plasticity and measurement error would always remain. Therefore, the assembly rules defining events must remain fuzzy. That is, they must be described by distributions rather than point estimates. The nature of the dispersion in any particular event can be established through bootstrapping, testing of holdout samples (Hair et al., 1998; Tabachnick and Fidell, 2001), or gathering new data. The overriding danger of adding too many cut levels per variable and/or too many variables is that this results in so many distinct events that increases in system accuracy are inevitably created by chance alone. This is synonymous with the problem of an increased Type I error rate in parametric procedures such as stepwise regression or multiple comparison procedures (Tabachnick and Fidell, 2001).

K 33,5/6

970

Problems and solutions A fundamental problem that exists in any clustering procedure used in reconstructability analysis is that outliers may greatly obscure the information contained in all the states and substates they occupy. For example, if the photosynthetic response in case 4 (Table I) were changed to 120 due to a genetic tolerance to salt stress (which some rare cypress trees are known to have), the resulting reconstructions R1, R2, and R3 lose an average of 30.73 percent system accuracy, rendering the reconstruction resulting from traditional clustering devoid of information (0.14 percent system accuracy). One possible solution involves computing separate reconstructions by omitting one observation at a time. Using either system accuracy or least-squares error, the resulting reconstructions, with and without each observation, can be compared. This procedure is similar to calculations of Studentized-t values, Cook’s distances, and Leverage values in standard regression procedures ( Hair et al., 1998; Tabachnick and Fidell, 2001). This would enable the user to scrutinize particular observations that have a disproportional impact on the model and to omit them if warranted. As demonstrated, agglomerative clustering algorithms that do not use the information in the system function will commit mistakes in selecting optimal cut levels. We propose the use of a divisive granulation approach that uses the cluster information of all predictor variables as well as the system function to find those cut levels in each variable that maximize system accuracy. To use this approach efficiently, it is essential to limit both number of variables and the number of clusters within variables. Once the minimum number of variables or variates has been selected, the number of likely cluster cut levels should be based on the previous knowledge of that variables impact on the system function. In our example scenario (Table I), our main controlling variable (salinity) was divided into three clusters because it is known to have no effect on photosynthesis in low concentrations, has a stressful effect at intermediate concentrations, and is lethal at high concentrations (as shown in Figure 4(B)). The remaining variables had much smaller effect sizes and their relationships with the system function are more similar to those shown in Figure 4(A) and (D), warranting only two clusters each. It is paramount to establish a hierarchy of the predictor variables prior to any granulation. In addition, the granulation procedure is conducted exclusively at the level of states (i.e. stipulations are required on all variables in the model). Beginning with the most important variable, all observations are grouped into a single cluster “low”. The largest value then defines the first cut for “high”. Before any other observation is added to this high cluster, all possible combinations with other variables are evaluated for maximum system accuracy in the “low” category. Sequentially, additional observations are added to the small cluster of the first predictor variable and it is also checked with all possible combinations of the subordinate variable clusters. This process

continues until the reconstructability based on the granulation of the first variable starts to decompose and the best reconstruction in this hierarchy is found. Cut points in the second most important variable proceed in the same fashion, while between-cluster inconsistencies in “high” and “low” cluster resolution are resolved by determining which of the inconsistencies (either the “low” value of the second variable in the “high” cluster of the first is a “high” value in the “low” cluster of the first variable or vice versa) produces a better reconstruction. After the predictor variables have been granulated, third cut levels may be produced to determine if further gains in system accuracy are possible. Beginning with the first variable in the hierarchy, the lowest value in the “low” cluster is deemed “low” and all other values in the “low” cluster become “medium”. Observations are added to the new “low” cluster until system accuracy peaks. Then, the lowest values in the “high” cluster are converted, one by one, to “medium” until information content in the reconstruction peaks again. Following this second sweep through the data, the granulation should be optimized. Sometimes the variable hierarchies established at the beginning of the granulation may not be transparent. In this case, one may wish to try an alternate, equally likely hierarchy and rerun the granulation to see if the reconstruction can be improved. Comparisons among the best reconstructions from each variable hierarchy may be interpretable to improve our understanding of the system of interest. Again it is advisable only to build new hierarchies based on theoretical a priori knowledge rather than using all randomly combined variable hierarchies, as this process, too, capitalizes on chance. Conclusions At present, our technology enables us to measure hundreds of variables precisely in any given system. It is thus tempting to use all of these in any analysis designed to explain system behavior. However, such an approach has three major disadvantages. First, due to the permutations possible and the complexity of the model, it becomes highly probable to find seemingly potent events by chance that do not reflect true relationships. Second, in k-systems analysis a large number of variables will introduce a huge amount of non-information by way of generating missing cells. Third, even if meaningful events are found containing many variables, the purpose of summarizing the observed system is defeated by the shear number of stipulations on every variable necessary to describe the event: the map has become the territory. These arguments also apply to the choice of the number of cut levels within each variable. Many systems on planet Earth are event driven rather than variable driven and are therefore particularly amenable to the k-systems framework. However,

The k-systems glitch

971

K 33,5/6

972

even though the k-systems reconstruction is event-based, the clustering of metric variables in the pre-analysis phase remains variable-oriented and subjective. Different clustering procedures may yield different results, none of which are theoretically grounded. An objective approach requires the use of the information in the system function to establish a basis for granulating the predictor variables. As demonstrated above, this approach also improves system accuracy. To date, clustering metric variables for k-systems analysis has been a means to an end. We motivate herein that a granulation procedure based on the system function may result in the discovery of fuzzy assembly rules. These rules can be used in simulation models to gain a deeper understanding of the systems we study. References Conner, W.H., McLeod, K.W. and McCarron, J.K. (1997), “Flooding and salinity effects on growth and survival of four common forested wetland species”, Wetlands Ecology and Management, Vol. 5, pp. 99-109. Hair, J.F. Jr, Anderson, R.E., Tatham, R.L. and Black, W.C. (1998), Multivariavte Data Analysis, 5th ed., Prentice-Hall, Upper Saddle River, NJ. Jones, B. (1985), “Reconstructability analysis of general functions”, International Journal of General Systems, Vol. 11, pp. 133-42. Jones, B. (1986), “K-systems analysis versus classical multivariate analysis”, International Journal of General Systems, Vol. 12, pp. 1-6. Norretranders, T. (1998), The User Illusion, Penguin Books, New York, NY. Shaffer, G.P. (1997), “K-systems analysis: an anti silver-bullet approach to ecosystems modeling”, Advances in Systems Science and Applications, pp. 32-9. Shaffer, G.P. and Sullivan, M.J. (1988), “Water column productivity attributable to displaced benthic diatoms in well-mixed shallow estuaries”, Journal of Phycology, Vol. 24, pp. 132-40. Tabachnick, B.G. and Fidell, L.S. (2001), Using Multivariate Statistics, 4th ed., Allyn and Bacon, Boston, MA. Further reading Allen, J.A., Chambers, L. and Pezeshki, S.R. (1997), “Effects of salinity on baldcypress seedlings: physiological responses in relation to salinity tolerence”, Wetlands, Vol. 17, pp. 310-20. Shaffer, G.P. (1988), “K-systems analysis for determining the factors influencing benthic microfloral productivity in a Louisiana estuary, USA”, Marine Ecology Progress Series, Vol. 43, pp. 43-54.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Directed extended dependency analysis for data mining Thaddeus T. Shannon and Martin Zwick Portland State University, Portland, Oregon, USA

Extended dependency analysis 973

Keywords Cybernetics, Programming and algorithm theory, Searching Abstract Extended dependency analysis (EDA) is a heuristic search technique for finding significant relationships between nominal variables in large data sets. The directed version of EDA searches for maximally predictive sets of independent variables with respect to a target dependent variable. The original implementation of EDA was an extension of reconstructability analysis. Our new implementation adds a variety of statistical significance tests at each decision point that allow the user to tailor the algorithm to a particular objective. It also utilizes data structures appropriate for the sparse data sets customary in contemporary data mining problems. Two examples that illustrate different approaches to assessing model quality tests are given in this paper.

1. Introduction Extended dependency analysis (EDA) is a heuristic search strategy, proposed by Conant (1987a, b) that directs the information theoretic exploration of large sets of nominal variables. It is an extension of reconstructability analysis (RA) (Klir, 1985, 1996; Krippendorff, 1986; Zwick, 2001). The number of computations involved in a complete RA scale doubly exponentially with the number of variables considered, while the more limited directed dependency analysis (DA) scales exponentially with the number of variables (RA allows an arbitrary number of relations in a model, while DA considers models with only one predicting relation). Conant devised EDA to scale polynomially with respect to the number of variables considered, while still being capable of discovering significant high order interactions between variables. In this paper, we illustrate some possible uses for EDA in the data mining context. The analysis in the examples we present was performed with a new software implementation developed for use on large data sets (Software System, 2003). The examples we present are intended to illustrate some of the goals and issues that arise in data mining. The problem data sets themselves are still relatively small (26 and 195 independent variables) and so are not indicative of EDA’s true potential, since in both cases the analysis can be performed in less than a minute on a personal computer. In Section 2, we begin by reviewing Conant’s original work in our own terms. Section 3 is a discussion of implementation issues that we believe This work was partially supported by the National Science Foundation under grant ECS-9904378. The authors would like to thank Roy Koch for access to the rainfall data, George Lendaris for the use of the satellite imagery, and Roger Conant for sharing his original EDA routines.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 973-983 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534010

K 33,5/6

974

germane in a data mining context. We introduce several variations on Conant’s original proposal to address these issues. In Sections 4 and 5, we present two examples of data mining tasks and the results we obtained using EDA: developing a forecasting model from a rainfall time series, and designing a pattern classifier for satellite imagery. Section 6 extends our earlier discussion of implementation option and suggests further applications for EDA. 2. The origins of EDA RA and related methods are techniques that explain some set of observed variables in terms of a set of one or more relations among the variables (Klir, 1985, 1996; Krippendorff, 1986). The variables are assumed to be nominal and the sense in which the relations explain the observations differs depending on the nature of the data. The nominal variables may be crisp possibilistic in which case the analysis is essentially set theoretic. Alternatively, the variables may be probabilistic, i.e. random variables with some joint distribution. In both the crisp possibilistic and probabilistic cases, RA seeks to find projections of the joint distribution that can reconstruct the original distribution when joined using the maximum uncertainty principle. EDA applies to probabilistic systems, and that is the focus of this paper. The variables that necessarily participate in a projection together are dependent on each other and form a relation. In what Conant termed a “static” analysis, finding all significant dependencies in a set of variables is the goal of the analysis, this corresponds to RA for a neutral (undirected) system. Alternatively, what Conant called a “dynamic” analysis seeks to explain one particular variable using all the other variables; this corresponds to RA for a directed system. In this case, one wants to find all the relations (projected distributions) in which the variable of interest participates. The set of variables in the identified relations together explain the dependent variable in the sense that they maximally reduce the uncertainty of the dependent variable. We consider this for our data mining context. RA is a method for finding the significant relations among a set of variables. The set of all possible relations that could be searched through increases doubly exponentially with the number of variables considered (Klir, 1981). Thus, direct application of RA quickly becomes impractical as the number of variables increases beyond seven or eight. Conant’s EDA escapes this curse of dimensionality by limiting the search to the subset of “saturated” (single relation) models that increases only polynomially with the number of variables. Therefore, in general, EDA will not find the best possible model, i.e. the full set of significant relations. EDA has two distinct phases, a directed modeling phase and an undirected modeling phase. If EDA is applied to a directed system, only the directed analysis needs to be performed. For an undirected analysis, the directed phase is carried out repeatedly with each variable in turn considered as the dependent

variable, and the results are then aggregated in the undirected phase. The directed analysis is composed of three operations, an initial search named initialize (Conant’s H2), followed by repeated applications of a routine named reduce (based on DA) and a second search heuristic named expand (Conant’s H3). The general relationship of the search phases is shown in Figure 1. The first heuristic, initialize, begins by calculating the three-way transmissions (mutual information)

Extended dependency analysis 975

TðD; C i C j Þ ¼ uðDC i C j Þ 2 uðDjC i C j Þ ¼ uðDÞ þ uðC i C j Þ 2 uðDC i C j Þ; where D is the dependent variable of interest, Ci is the ith candidate variable (Klir and Wierman, 1998), and u is the information-theoretic uncertainty (Shannon entropy). All possible combinations of candidate variables are tried. These transmission values are the strengths of the relationships between the dependent variable and the pair of independent variables. If the dependent variable is completely determined when the values of the independent variables is known, this transmission value will equal the initial uncertainty of the dependent variable. If on the other hand, the dependent variable is independent of the independent variables, this transmission will be zero. Given n independent variables there are n(n21)/2 such pairs to check. For each independent variable Ci, the transmission value for all possible pairings with other independent variables is calculated, and the highest transmission value found is stored if that interaction passes a chi-square significance test. This results in a list of variables with transmission values of length n, which can be sorted based on the transmission values. The d variables with the highest transmission values then form an initial model M I ¼ C 1 C 2 . . .C d ; the search set of candidate variables. The size of the search set is the user selected parameter that controls both model complexity and run time for the analysis. In essence, this heuristic performs an exhaustive search through the triadic relationships involving the dependent variable. Reduce tests the significance of the uncertainty reduction of the dependent variable provided by a candidate model M. The significance of the participation of each individual variable in the model is evaluated in the context of the higher order relation posed by the model. A significance criteria, either statistical or information theoretic, must be provided for this evaluation. Some possible evaluation criteria and the problem contexts in which they could be appropriate are discussed in Section 3 on implementation issues.

Figure 1. EDA directed analysis

K 33,5/6

976

Variables that are found to be not significant are eliminated from the candidate model, resulting in a simplified model M S ¼ M I 2 C NS : The second heuristic, expand, takes the candidate set of variables passed from reduce and attempts to expand it by checking for higher order relations involving the dependent variable, all the current candidate variables, and every individual non-candidate variable. Specifically, the transmissions T(D; MSCNT) are calculated for each variable CNT not tried earlier in the candidate set. This search allows variables into the explanatory set that are found to take part in higher order relationships to a significant degree (as determined by a chi-square test), even though they were not found to take part in significant relationships in initialize. The search alternates between expand in which the candidate model is expanded to contain d variables, and reduce in which insignificant variables are discarded. The search ends when a set of d significant variables has been found, or when expand can find no more candidate variables to add to the model. It is almost always the case that the bulk of the computation in any analysis takes place in reduce. 3. Implementation issues for data mining As the primary reason for adopting a set of search heuristics such as EDA is to avoid exponential scaling of the computational load with respect to the number of variables considered, there are a number of implementation issues that require attention. Consider for example, the analysis of 100 binary valued variables. In principle, the complete contingency table representing this data set contains 2100 entries. In practice, we will find this table sparsely populated, if for no other reason than that we have a limited amount of data (in this example one observation requires 12.5 bytes of storage, so one terabyte of observations, i.e. 240 observations require 12.5 terabytes of storage, but would still leave at least 260 empty entries in the table). Analysis of large number of variables necessarily implies comparative sparseness of data. The first consequence of this observation is that sparse matrix methods should be used for data representation and manipulation. The size of the overall contingency table for a data set scales exponentially with the number of variables; therefore, to maintain the desired polynomial scaling of EDA we must omit all the empty cells from our representation. The second consequence of data paucity is the limitation of our ability to test large models or relations for statistical significance. As a rule of thumb, one needs five times as many observations as degrees of freedom in a model to justify a chi-square significance test of the model. The degrees of freedom for the models we consider are " dfðD; C 1 . . .C n Þ ¼ ðdfðDÞ 2 1Þ

n Y i¼1

! dfðC i Þ

# 21 :

Thus, the number of observations we need to justify fully analyzing a data set (i.e. test all possible models) scales exponentially with the number of variables. As the above example illustrates, one needs an astronomical number of observations to justify completely analyzing even a medium size collection of variables. Thus, the domain of EDA is a model search space that scales doubly exponentially with the number of variables over an observation state space that scales exponentially with the number of variables, wherein the required number of observations for a complete statistical analysis also scales exponentially. While the scaling of the observation space can be completely handled for any practical implementation by omitting all the empty cells from the contingency table, choosing which subsets of models to check and how to check them requires factoring together the goals of the analysis with the actual data constraint encountered. Three alternative analyses can be performed, first in which models are compared to each other incrementally considering both complexity and uncertainty reduction, second in which models are compared to a reference model (usually the independence model) to maximize uncertainty reduction subject to a significance constraint, and third in which models are compared to a reference model to minimize complexity subject to an uncertainty reduction constraint. The first case uses an incremental significance test between models; the second a cumulative significance test with respect to the reference model; the third imposes no significance test. Different problem contexts suggest these different kinds of analyses. The first case might involve searching for significant relationships between variables that provide meaningful scientific explanations of observed phenomena. The second case might aim at developing a model that is maximally predictive of another variable. The third case might involve selection of features for use in compression or pattern recognition. These objectives suggest the use of different significance tests at the various decision points within EDA. In the first case, the objective is to draw only the conclusions that are significantly supported by the data. If there is not enough data to justify a relation, then the relation is not considered in the analysis. In this case, one would use the usual chi-square significance tests with an appropriately chosen a at each decision point. Usually this test would be an aggregate test of the model H1 versus the independence model H0, L 2 ðM Þ ¼ nð1:3863ÞTðD; M Þ , x 2 ðdfðD; M ÞÞ; in initialize and an inter-model (incremental) test, L 2 ðM n ! M n21 Þ ¼ nð1:3863Þ½TðD; M n Þ 2 TðD; M n21 Þ , x 2 ðdfðD; M n Þ 2 dfðD; M n21 ÞÞ;

Extended dependency analysis 977

K 33,5/6

978

in reduce and expand. In addition, one sets the search set size to limit the number of models tested that contain too many degrees of freedom to justify testing, i.e. models that are a priori likely fail the significance test due to sample size. One would like to avoid testing such models and if the ordinality of all the independent variables is the same, this can be done exactly, i.e. no a priori likely to fail models need to be tested. Otherwise, the search set size must be large enough to include all the potentially justifiable models and some unjustified models will be included. In the second case, the objective is to obtain the maximum uncertainty reduction possible. With comparatively small number of observations available, the probability of committing a type II error with an incremental significance test will tend to be high. A more appropriate significance test in reduce would be the cumulative significance of the proposed model versus the independence model. One can drop the significance test from expand altogether at the cost of lengthening the analysis somewhat. Changing to a cumulative test in reduce tends to abbreviate the search, since a depth first search will tend to stop at a larger model, which in turn tend to limit the search depth. As reduce is less computationally costly with the cumulative test, it can be applied to check a greater number of candidate variables. In the third case, the objective is data reduction without serious information loss. There may be significant, but minor information that can be discarded to obtain better compression, or there may be insufficient data to establish significance using standard statistical tests. In this latter situation one is often faced with design challenges in which a large data set is thought to contain sufficient information to achieve an objective. A wide variety of machine learning algorithms have been developed to extract significant rules from such data sets, e.g. artificial neural networks, tree classifiers, etc. Implementation of any such algorithm is greatly benefited by a reduction in the dimensionality of the data set. Obviously, any dimensionality reduction strategy employed needs to minimize the inherent loss of information. EDA implemented in information mode only (with no statistical significance test) provides a method for identifying those variables that can be discarded without serious information loss. In this context, the significance of the final model can be treated explicitly using the tools of structural risk minimization (Haykin, 1999). 4. Application: mask analysis of time series for prediction This example application uses EDA to perform mask analysis on a multivariate time series, with the goal of forecasting future states. The data consist of daily rainfall measurements collected at four sites over the period from 1982 to 1990. The original data series was quantitative (inches of rainfall), for our analysis we abstracted the data into two values: measurable precipitation and no measurable precipitation. This example is an extension and refinement of the analysis reported in the work of Zwick et al. (1995). In this

example, we use cumulative tests for model significance, i.e. we seek the most explanatory model that is significant compared to the independence model. We form five variables out of the time series for each site by taking the current and the first through fourth lagged values of the time series as the variables current values. This variable assignment is summarized in Table I. We also introduce a set of seasonal variables to include in the analysis. Since we do not know a priori how many seasons to include, nor when to define the start of each season, we include a range of possible season variables as defined in Table II. In the work of Zwick et al. (1995), it was noted that A, B, C, and D are not independent, i.e. if we form the aggregate variable Z ¼ ABCD; then

Extended dependency analysis 979

uðZ Þ , uðAÞ þ uðBÞ þ uðCÞ þ uðDÞ ¼ uðA : B : C : DÞ: In this case uðZ Þ ¼ 3:25 bits, which is significantly less than uðA : B : C : DÞ ¼ 3:79 bits. The aggregate variable was then used as the dependent variable for the rest of the analysis. The analysis was further simplified by including only first and second lags of observed rainfall. Finally, a seasonal variable was selected in a preliminary analysis that compared the predictive power (uncertainty reduction divided by the degrees of freedom) of each of the candidate variables in isolation. The preliminary analysis chose W10 as the most efficient predictor of Z. The analysis went on to find that Y ¼ FGHJKW 10

Site 1 2 3 4

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10

t24

t23

Day t22

t21

t

Q R S T

M N O P

I J K L

E F G H

A B C D

Table I. Mask analysis framework

12 seasons 6 seasons ( Jan/Feb, . . .) 4 seasons ( Jan-Mar, Apr-Jun, . . .) 2 seasons (Oct-Mar, Apr-Sep) 2 seasons (Nov-Apr, May-Oct) 2 seasons (Dec-May, Jun-Nov) 2 seasons ( Jan-Jun, Jul-Dec) 2 seasons (Feb-Jul, Aug-Jan) 2 seasons (Mar-Aug, Sep-Feb) 2 seasons (Nov-May, Jun-Oct)

Table II. Season variables considered

K 33,5/6

was the best statistically significant predictor of Z, with uðZ jY Þ ¼ 2:52 bits for an uncertainty reduction of 22.4 percent. Using EDA we analyzed each dependent variable separately, including all 26 of the independent variables E, . . ., T, and W1, . . ., W10 in each case. The models suggested by initialize were:

980

Y A ¼ EFGHJLW 1 W 9 W 10 ;

uðAjY A Þ ¼ 0:52 bits;

Du ¼ 39:1 percent;

Y B ¼ FGHJLPW 1 W 9 W 10 ;

uðBjY B Þ ¼ 0:56 bits;

Du ¼ 41:9 percent;

Y C ¼ FGHJLW 1 W 6 W 9 W 10 ;

uðCjY C Þ ¼ 0:67 bits;

Du ¼ 32:2 percent;

Y D ¼ FGHJKLNPTW 6 W 10 ;

uðDjY D Þ ¼ 0:41 bits;

Du ¼ 58:1 percent:

None of these intermediate models are statistically significant. Note that all these initial proposals include two or more seasonal variables. We would anticipate that these variables should be redundant and that the analysis will select the best from among them. The best models obtained at the end of the analysis using a cumulative significance test with an a of 0.05, and a search set allowing up to nine independent variables (11 for site 4) were: Y A ¼ FGHJLSTW 1 ;

uðAjY A Þ ¼ 0:42 bits;

Du ¼ 50:5 percent;

Y B ¼ FGHLNORW 1 ;

uðBjY B Þ ¼ 0:46 bits;

Du ¼ 52:3 percent;

Y C ¼ FGHJOSTW 1 ;

uðCjY C Þ ¼ 0:48 bits;

Du ¼ 51:6 percent;

Y D ¼ FGHKNPTW 1 ;

uðDjY D Þ ¼ 0:46 bits;

Du ¼ 53:5 percent:

These final models include only one seasonal variable each, in all cases W1, which differentiates all 12 months was chosen. In aggregate, we then have uðAjY A : BjY B : CjY C : DjY D Þ ¼ 1:82 bits;

Du ¼ 52:0 percent

which represents a reduction in uncertainty of 0.7 bits, approximately 28 percent, beyond that of the best joint predictor found earlier. Finding the best predictors of A, B, C, and D separately is better than finding the best predictors of a single joint dependent variable, Z ¼ ABCD: Further reduction in uncertainty may be possible by using the constraints among A, B, C, and D, by adding a model component specifying these constraints. 5. Application: classifier design with limited observations This example demonstrates the use of EDA in a case where there is too little data to justify significance tests of all but the simplest relations. For classifier design from a Bayesian viewpoint, the preferred procedure would be the structural risk minimization. One would trade classifier simplicity against performance on the limited design examples. Greater simplicity tends to

produce poorer performance on design samples, but increases the likelihood that generalization performance will mirror performance on design examples. Our proposal here is to use EDA to select a limited number of features out of all the observable variables, for use in a classifier. Furthermore, we can use the contingency table from EDA to form a prototype classifier that can be used as is or refined using other methods. In this context, performing statistical significance tests in EDA would amount to picking one particular solution to the structural risk minimization problem without considering the actual structural form of the final classifier. Worse yet, this solution would tend not to be a good starting point for further refinement since potentially useful information will already have been lost. Instead, we suggest using the information only version of EDA to select features with the structural risk minimization and refinement processes handled explicitly as separate steps. Our example problem is that of designing a land-use classifier for satellite imagery. The images are 2 km circular aperture photographs of the Phoenix metropolitan area, derived from plates taken during the Sky Lab II mission (Lendaris and Chism, 1975). Each image is represented by wedge and annular ring samples of its two-dimensional Fourier transform. One hundred 1.88 wedge samples and 95 ring samples were taken from each transform. This representation offers translation, size and rotational invariance for items in each image (Lendaris and Stanley, 1970). Five land use classes are represented in the sample: urban, residential, farm, mountain and water. While we have 195 observables per image, we have only 177 extant images to work with. We divide these into two sets, a 100-image set for classifier design, and a 77-image set for classifier evaluation. We suggest that this condition of limited data is actually a representative of a major class of problems, e.g. genome studies. Since the ring and wedge samples are quantitative variables, we begin by binning each variable. The fewer bins we use per variable, the smaller the contingency table will be. Nevertheless, our binning should preserve meaningful distinctions between the values for each variable. In this case, we choose to use five bins per variable based on the surmise that five land use types could give rise to five distinct values. We break up the total observed interval of values for each variable into five subintervals of equal length and assign observations to bins based on which subinterval they fall into. Using EDA we searched through the entire set of wedge (W ) and ring (R) variables. The intermediate models suggested by initialize for a search set of size eight was: Y ¼ W 37 W 70 W 83 W 84 R47 R48 R49 R52 ; uðtypejY Þ ¼ 0:0 bits; Du ¼ 100 percent; This predictor is not statistically significant, however, it is the smallest set that initialize can find that completely explains land use type. The best statistically significant predictor, however, using a cumulative significance test with an a of 0.05 and a search set of size eight was:

Extended dependency analysis 981

K 33,5/6

982

Y ¼ W 37 R48 R81 ;

uðtypejY Þ ¼ 0:15 bits;

Du ¼ 93:7 percent;

We have several options in the data reduction context, we could keep all the variables selected by initialize for use in our final classifier design, we could use only the three variables included in the statistically significant predictor, or we could employee reduce and expand with information content tests to see if there is a subset of the initialize dependency set that still contains all the information necessary to determine land use type. Applying this last procedure, we find that there are four equally explanatory dependency sets of size four: Y ¼ W 37 W 83 W 84 R47 ; Y ¼ W 37 W 83 W 84 R48 ; Y ¼ W 37 W 83 W 84 R49 ; Y ¼ W 70 W 83 W 84 R47 ; and one of size five, Y ¼ W 70 W 83 W 84 R48 R52 ; all with uðtypejY Þ ¼ 0:0 bits;

Du ¼ 100 percent

This additional analysis suggests that none of the eight variables found by initialize is irrelevant, but there is redundancy when using all eight. In the final models, all the variables appear in at least one set of size four or five. That Ring 52 does not participate in any of the sets of size four suggests that it could be a candidate for elimination. When the above binning scheme and analysis are used to directly construct a classifier, the generalization rates on the holdout data set for the four variable predictors range from 89 to 94 percent ( Lendaris et al., 1999). 6. Conclusion In summary, directed EDA is a useful method for data reduction and forecasting problems involving large number of nominal variables. Our recent software implementation can apply EDA’s polynomial time search heuristics on large number of variables using a variety of significance tests appropriate for data mining problems. References Conant, R.C. (1987a), “Extended dependency analysis of large systems; part I: dynamic analysis”, Int. J. General Systems, Vol. 14, pp. 97-123. Conant, R.C. (1987b), “Extended dependency analysis of large systems; part I: static analysis”, Int. J. General Systems, Vol. 14, pp. 125-41.

Haykin, S. (1999), Neural Networks: A Comprehensive Foundation, 2nd ed., Prentice-Hall, Upper Saddle River, NJ. Klir, G.J. (1981), Int. J. General Systems, Special issue on reconstructability analysis, Vol. 7 No. 1. Klir, G.J. (1985), Architecture of Systems Problem Solving, Plenum Publishing, New York, NY. Klir, G.J. (1996), “Reconstructability analysis: an offspring of Ashby’s constraint theory”, Int. J. General Systems, Vol. 3 No. 4, pp. 267-71. Klir, G.J. and Wierman, M.J. (1998), Uncertainty-Based Information, Physica-Verlag, Heidelberg. Krippendorff, K. (1986), Information Theoretic Structural Models for Qualitative Data, Sage, Thousand Oaks, CA. Lendaris, G.G. and Chism, S.B. (1975), Land-Use Classification of Skylab S-190B Photography Using Optical Fourier Transform Data, NASA LBJ Space Center, #LEC-5633, March 1995. Lendaris, G.G. and Stanley, G.L. (1970), “Diffraction pattern sampling for pattern recognition”, Proceedings of IEEE, Vol. 58 No. 2, pp. 198-216. Lendaris, G.G., Shannon, T.T. and Zwick, M. (1999), “Prestructuring neural networks for pattern recognition using extended dependency analysis”, Proceedings of Applications and Science of Computational Intelligence II – AeroSense’99, April 1999, Orlando, FL, SPIE. Software System (2003), available at: www.sysc.pdx.edu/res_struct.html Zwick, M. (2001), “Wholes and part in general system methodology”, in Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY. Zwick, M., Shu, H. and Koch, R. (1995), “Information-theoretic mask analysis of rainfall time-series data”, Advances in System Science and Applications, No. 1, pp. 154-9.

Extended dependency analysis 983

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

984

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Instant modelling and data-knowledge processing by reconstructability analysis Guangfu Shu Institute of Systems Science, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing, People’s Republic of China Keywords Cybernetics, Information, Modelling Abstract This paper first reviews factor reconstructability analysis (RA) modelling completely by data. Gives a description of levelled variable factor reconstruction analysis model generation process and its further extension to forecasting, evaluation and optimisation. It introduces RA modelling with multi-variety information and knowledge. From generation of mixed variable RA models to generation of models with both data and knowledge, the paper gives a related table and a figure on “Information flow of reconstructability analysis with multi-variety information and knowledge” which included database, RA relational knowledge base, and interfaces for input of experts’ knowledge. Finally, it gives some examples and prospects.

1. Introduction First, reconstructability analysis (RA) can get closeness of behaviour function of overall system and reconstruction hypotheses directly from aggregate states. Secondly, closeness of behaviour function can do factor analysis, forecasting, evaluation, optimisation, risk analysis and decision support analysis. Thirdly, by connecting RA with a database and setting an appropriate interface and a model generation program, we can generate a RA model very rapidly. Fourthly, by using some suitable database which can input knowledge representing tendency and logical relation-restricted number series, inputting them into data-knowledge reconstruction analysis programs, we can generate to some extent a model comprehending numerical data and knowledge. By implementation of these four parts, we can do instant modelling by RA. This paper will review, the first two parts and introduce the second two parts.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 984-991 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534029

2. Modelling on the basis of data Cavallo and Klir (1981) posed behaviour function closeness for evaluation of system hypothesis. Jones (1989) further used behaviour function closeness to find main substate and main variable states of systems. Shu (1993) combined state levels of two different sorts of variables – objectives and factors to form different system hypotheses. Macro-economic related application of this work is supported by National Natural Science Foundation of China Project Number 79990580.

By comparing the closeness of behaviour function of the different system hypotheses to the behaviour function of the overall system, we can obtain the different importance degrees of the different factor levels. For problems completely depending on levelled data, we have the following. First, partition criteria and factors into several levels. On the basis of these levels, which are the so-called “factor and criterion state levels”, we can form the aggregate state-behaviour function table shown in Table I. In Table I, we have vn ðn ¼ 1; 2; . . .; N Þ the nth factor, wl ðl ¼ 1; 2; . . .; LÞ the lth criterion, Am ðm ¼ 1; 2; . . .; M Þ the mth aggregate state, rm,n (m ¼ 1; 2; . . .; M ; n ¼ 1; 2; . . .; N) the level of the nth factor of the mth aggregate state, Rm,l the level of lth criterion of the mth aggregate state; f m ðm ¼ 1; 2; . . .; M Þ the behaviour function value of the mth aggregate state. By this table, factor RA computing program can find the related importance degree of a certain factor state level mn,k (i.e. the kth level of the nth factor) and a certain criterion level hl,j (i.e. the jth level of the lth criterion). It is done by comparing the distance (or closeness) between behaviour function of hypothesis and the overall system. The smaller the distance, the more importance degree the factor level is for the criterion level. Here the computing completely depends on selection of criteria and factors, determination of the levels of variable (criterion, factor) state levels in each aggregate state (data group). All these can be done by a model generation program combined with a suitable database and a suitable man-machine interface. On this interface, experts, system analysts or decision makers can select criterion-factor data groups and principles for determination of variable state levels. Generation of such a model can be done instantly, for instance in 1 s or less on a microcomputer with a very ordinary database like Access in Office 2000, not with more powerful databases. For factor analysis, combining the generation program with RA factor analysis program can be done easily. For forecasting, evaluation and optimization, on the basis of results of factor analysis, we need to use further the RA forecasting, evaluation optimization program (Shu, 1997). Hence, we further need to generate the corresponding forecasting, evaluation condition inputs. Here, the optimization is r A1 A2 .. .

v1 r1,1 r2,1 .. .

v2 r1,2 r2,2 .. .

... ... ...

vn r1,n r2,n .. .

... ... ...

vN r1,N r2,N .. .

w1 R1,1 R2,1 .. .

w1 R1,2 R2,2 .. .

... ... ...

wl R1,l R2,l .. .

... ... ...

wL R1,L R2,L .. .

AM .. . AM

rm,1 .. .

rm,2 .. .

...

rm,n .. .

...

rm,N .. .

Rm,1 .. .

Rm,2 .. .

...

Rm,L .. .

...

Rm,L .. .

rM,1

rM,2

...

rM,n

...

rM,N

Rm,1

RM,2

...

Rm,l

...

RM,L

f f1 f2 .. .

Instant modelling and data-knowledge 985

Table I. fm Aggregate .. . state-behaviour function table fM

K 33,5/6

986

unconstrained. For constrained optimization and constrained evaluation we need the following RA with knowledge. 3. Modelling with multivariety information and knowledge Levelled variables for problems which can only be measured in levels or some cases where only trends or categories can be given are very fitful. But when more accurate forecasting values or more accurate quantitative evaluation values, or more accurate quantitative factor optimized values are needed, we have to use mixed variable RA. Further in some problems like complex machine tool overall design optimization problem and heart disease diagnosis problems, we need to add complex formed logical relations. For these cases, we need RA for systems with multivariety information and knowledge (Shu, 1998). Here, we have one of its more generalized form as follows. An overall system with levelled-discrete-continuous mixed variables with logical relations and mathematical constraints can be defined as follows. B ¼ fV ; n; s0 ; s00 ; f0 ; f00 ; A0 ; A00 ; c 0 ; c 00 ; Q; f ; f 0 ; f 00 }

ð1Þ

GWFF B ðV ; A0 ; A00 f 0 ; f 00 ; Þ ) T

ð2Þ

gðV ; A0 ; A00 ; c 0 ; c 00 ; f 0 ; f 00 Þ $ 0

ð3Þ

satisfying

where V ¼ fvi ji [ N n } ðN n ¼ f1; 2; . . .; N }Þ is a set of variables ðN , 1Þ; n ¼ fV j jj [ N m ; m # N } is a family of state sets; s 0 ; s 00 : V ! n are assignment functions, each assigns a set of states from n to each variable in V, where s 0 depicts the lower bound level of the actual aggregate states and s 00 depicts the upper bound levels of the actual aggregate states; A0 ¼ s 0 ðv1 Þ £ s 0 ðv2 Þ £ . . . £ s 0 ðvr Þ. . . £ s 0 ðvN Þ are the lower bound aggregates of the possible aggregate states, A 00 ¼ s 00 ðv1 Þ £ s 00 ðv2 Þ £ . . . £ s 00 ðvN Þ are the upper bound aggregates of the possible aggregate states; c0 ¼ f0 ðv1 Þ £ f0 ðv2 Þ £ . . . £ f0 ðvN Þ denote the closeness of the aggregate states to their upper bounds; c00 ¼ f00 ðv1 Þ £ f00 ðv2 Þ £ . . . £ f00 ðvN Þ denote the closeness of the aggregate states to their lower bounds; Q is a set of real numbers which could specify the characteristics of the aggregate states; f : ðA0 ; A00 Þ ! Q is a function representing the information of the related aggregate states called behaviour function which depicts the behaviour of the aggregate states; f 0 , f 00 are the behaviour function values of the lower and upper bound aggregates of the aggregate states; GWFFB(V, A0 , A00 , f 0 , f 00 ) represents the logical relations between levels or level bounds of the variables; gðV ; A0 ; A00 ; c 0 ; c 00 ; f 0 ; f 00 Þ $ 0 are the mathematical constraints between the levels, level bounds and closeness values.

A useful representation of a RA problem with mixed variables is a mixed variable system aggregate state behaviour function distribution table shown in Table II, where vn ðn ¼ 1; 2; . . .; N Þ denotes the nth factor; wl ðl ¼ 1; 2; . . .; LÞ denotes the lth criterion. A0m ; A00m ðm ¼ 1; 2; . . .; M Þ denote the lower and upper level bounds of the mth aggregate state; r 0m;n ; r00m;n ; R0m;l ; R00m;l (m ¼ 1; 2; . . .; M ; n ¼ 1; 2; . . .; N ; l ¼ 1; 2; . . .; L) denote the lower and upper level bounds of the nth factor and lth 00 criterion of the mth aggregate state; c0m;n ; c00m;n ; c0m;l ; cm,l denote the closeness between the nth factor value and lth criterion value of the mth aggregate state and the lower and upper bounds of the aggregate state; f0 , f00 denote the behaviour function values of the lower and upper bounds of the mth aggregate state. Table II satisfies expression of levelled-discrete-continuous mixed variable data analysis problems. It is a direct extension of Klir-Jones levelled variable RA problems. This table has a particular characteristic that for any aggregate state, it has an upper bound and a lower bound. The variable states are stated by the closeness to the upper bound and the closeness to the lower bound. With these expressions, numerical values between the levels or level bounds can be depicted quite accurately. So it can overcome the loss of information in the completely levelled variable RA. Furthermore, since it can quite accurately

v1

2 A 01

r 01;1

4

c 01;1

2 A001 .. . A 0m

A00m .. . A 0M

4

...

vn

2

r 01;n

5 ... 4 3

5 ... c001;1 .. 2 0. 3 r m;1 4 0 5 ... c m;1 2 00 3 rm;1 4 00 5 . . . cm;1

3

...

2

5 ...

4

3

2

5 ... c001;n .. 2 0. 3 r m;n 4 0 5 ... c m;n 2 00 3 rm;n 4 00 5 . . . cm;n

4

c 01;n

2 4

r001;n

vN r 01;n r 01;n r001;N

3 5 3

w1

wl wL 3 2 0 3 R 01;l R 1;L 4 0 5 ... 4 0 5 ... 4 0 5 f 1;1 f 1;L f 1;l 2

R 01;1

2

R001;1

3

3

5 4 00 5 . . . c001;N f1;1 .. .. 2 0 . 3 2 0. 3 r m;N R m;1 4 0 5 4 0 5 ... c m;N f m;1 2 00 3 2 00 3 rm;N Rm;1 4 00 5 4 00 5 . . . cm;N fm;1

2

3

2

5 ... f001;l .. 2 .0 3 R m;l 4 0 5 ... f m;l 2 00 3 Rm;l 4 00 5 . . . fm;l

4

4

R001;l

R001;L

4

r 00M ;1

c00M ;1

3

2

5 ... 4

r 00M ;n

c00M ;n

3

2

5 ... 4

r00M ;N

c00M ;N

3 2 5 4

R00M ;1

f00M ;1

3

2

5 ... 4

R00M ;l

f00M ;l

3

987

f

2

f

0 1

3

5 f 00 1 f001;L .. .. 2 0. 3 . R m;L 4 0 5 f0 m f m;L 2 00 3 Rm;L 4 00 5 f 00 m fm;L

.. .. .. .. .. .. .. 2 0. 3 2 0. 3 2 0 . 3 2 0. 3 2 0. 3 2 0. 3 . R M ;l r M ;n r M ;N R M ;1 R M ;L r M ;1 4 0 5 ... 4 0 5 ... 4 0 5 4 0 5 ... 4 0 5 ... 4 0 5 f 0 M c M ;n c M ;N f M ;1 f M ;L c m;1 f M ;l 2

A00M

r001;1

3

Instant modelling and data-knowledge

2

5 ... 4

R00M ;L

f00M ;L

3 5 f 00 M

Table II. Mixed variable system aggregate state behaviour function distribution table

K 33,5/6

988

depict the quantitative values by behaviour function closeness, it enables quite accurate quantitative analysis on its basis. In forecasting, evaluation and optimization this sometimes is very valuable. For instance, in price forecasting and other macro-economical forecasting, 1 percent or less deviation is already quite big. By levelled variable method its result became unacceptable. But by this mixed variable method it can obtain quite accurate numerical results. In its modelling, by a simple interface with the database, a model generation program can allow the selected data of the selected factors and criteria almost immediately (less than a second) generate an aggregate state table, ready for RA program to be run. The shortcoming of the above table is that it cannot depict the knowledge treatment part of the RA with multi-variety information and knowledge. Figure 1 shows that the information flow of RA with multivariety information and knowledge can more completely depict these methods. On the left upper side, the usual mixed numerical data can be inputted by Interface 1 from the database. On this man-machine combined interface the experts can directly or by help of an operator select from a content list the factors and criteria and their series of data needed for the RA. They are sent into an aggregate state generation program, which transforms the original data into RA aggregate state behaviour function table form. For RA completely by using numerical data of the whole system we can go as follows. After going through reconstructability factor analysis (RFA) we can go directly to RA evaluation, forecasting, risk analysis, optimization programs and get results. For RA containing sub-system numerical data, according to conditions of the problem it can be inputted through Interface 2 into a substate generation program, which generates sub-system aggregate state behaviour function table and later on by relational join procedure goes to overall system aggregate state behaviour function table. It can also be inputted into simplified sub-whole data transition program and then directly go to overall aggregate state behaviour function table. If there are some knowledge to be inputted we should use the following programs, procedures and interfaces. 4. Reconstructability analysis with multivariety data and knowledge Input and treatment of knowledge have many forms. In some cases, e.g. in overall design optimization some of the knowledge such as overall scheme of design of experts can be expressed as a series of alternative order numbers similar to data. It can either be inputted into Aggregate State Generation Program or directly be inputted into overall system aggregate state behaviour function table as System Hypothesis Scenario. In other cases, such as macro-economic forecasting problems and policy evaluation problems, knowledge scenarios of experts can be inputted into RA forecasting or RA evaluation programs. In practice, knowledge scenario is one of the mostly used

Instant modelling and data-knowledge 989

Figure 1. Information flow of RA with multivariety information and knowledge

K 33,5/6

990

knowledge form. Here, we might also have Substate Knowledge Scenario, which can be inputted by Interface 4, and after passing Substate Generation program it is sent to subsystem aggregate state behaviour function table. In some occasions, these scenarios can also be sent directly into subsystem aggregate state behaviour function table. Other than scenarios which are directly depicted by aggregate states and substates, a lot of knowledge are on relations of the variables. Up to now in practice most are in forms of structural relations, logical relations, mathematical constraints and static or dynamic equations. These are inputted through Interface 5 (on the left lower side of Figure 1) into RA Relational Knowledge Base, where with joint action of Relational Transition Program is transformed into system structure undirected graph, logical relations or other qualitative relations, mathematical constraints or equations. Later on through treatments related with conditional C-structures, system relation matrix, subsystem aggregate state behaviour function table, relational join, etc. go to overall system aggregate state behaviour function table. This process is fit for design problems, on which design schemes or aggregate state unsatisfying the relations should not be generated or when strict conditions make the systems unable to have such aggregate states. But for some cases, like in some evaluation problems, relations can only let systems get certain sort of results. Conditions cannot forbid some aggregate states to occur. In these cases, relations whether quantitative or qualitative are sent directly to the evaluation, forecasting or other last processing procedures. From the above, we can see that comprehension and integration of information and knowledge in RA can have many different variety of processing flows for different problems. 5. Applications and prospects Comprehensive integration RA of knowledge and data was first used in research of overall optimum design of a machine tool (Shu, 2000). Experts’ overall design schemes were expressed in overall system aggregate states. There are some quite definite logical relations which should be satisfied by selected alternatives. These relations are inputted into the aggregate state generation procedure, which ensures the unsatisfied aggregate states, or design schemes to be excluded. On these bases, optimum design scheme is selected. A project working on talent person demand forecasting combined statistical data and substate knowledge scenario which mainly reflects the social environment and policy impacts and other cities’ experience and data. In a macro-economic gross domestic product research and forecasting problem, substate knowledge scenario on social environment and policy impact was used (Shu, 2001). Here, we first practiced model generation program. A data file consisting original macro-economic data was transformed into a mixed variable aggregate state behaviour function table in about 1 s.

In some other instances, e.g. in a sort of heart disease diagnosis program all sorts of aggregate states may happen. Relations and constraints cannot forbid certain aggregate state related data group to occur. Here knowledge is directly sent to the RA evaluation procedure to obtain correct evaluation. Here again the aggregate state behaviour function table was generated in less than 1 s for 71 factors and 24 aggregate states, which were formed by clustering of more than 1,700 patient records. In these problems, due to lack of knowledge of computing, the data gave poor results. But after combining an experts’ knowledge and experience they became reasonable results. For model generation, in particular, let us apply instant modelling. Combining a suitable database, RA relational knowledge base and interfaces, we can very instantly obtain results just after we select the variables and its data series, and after the knowledge is inputted, we can almost immediately obtain results. These gave the condition for instant modification of the models in discussion to obtain instant computing results. These also gave the conditions to let a notebook computer user to carryout the computer aided writing of reports or essays without waiting quite a long time (e.g. several weeks, several months or years) to obtain modelling results in classical problems. We can foresee its broad and deep effects for management, technical, scientific (social and natural) works and other academic activities or even everyday life. References Cavallo, R.E. and Klir, G.J. (1981), “Reconstructability analysis: evaluation of reconstruction hypothesis”, International Journal of General Systems, Vol. 11, pp. 1-58. Jones, B. (1989), “A program for reconstructability analysis”, International Journal of General Systems, Vol. 15, pp. 199-205. Shu, G.F. (1993), “Some advances in system reconstructability analysis theory, methodology and applications”, Proceedings of the Second International Conference on Systems Science and Systems Engineering, 24-27 August 1993, China International Academic Publishers, Beijing, pp. 326-31. Shu, G.F. (1997), “Reconstruction analysis methods for forecasting, risks, design, dynamical problems and applications”, Advances in Systems Science and Applications (Special Issue), The 2nd Workshop of International Institute for General Systems Studies, San Marcos, pp. 72-9. Shu, G.F. (1998), “Reconstructability analysis for systems related with multivariety information and knowledge”, Systems Studies, The 3rd Workshop of International Institute for General Systems 26-28 July 1998, Bei daihe, Qinhuandao, China Systems Science and its Applications, Tianjin People’s Publishing House, Tianjin, pp. 69-74. Shu, G.F. (2000), “Overall reconstruction optimum design method and applications”, International Journal of General Systems, Vol. 29 No. 3, pp. 411-8. Shu, G.F. (2001), “Meta-synthetic system reconstruction and applications on macro-economic researches”, Journal of Systems Engineering, Vol. 16 No. 5, pp. 349-53, in Chinese with English abstract.

Instant modelling and data-knowledge 991

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

992

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Application of reconstructability analysis in system structure Pengtao Wang and Changyun Yu Department of Computer Science, Tianjin University of Technology, Tianjin, People’s Republic of China Keywords Cybernetics, Optimization techniques, Systems and control theory Abstract In the development research of talent resource, the most important topics of talent resource forecasting and optimization are the structure of talent resource, requirement number and talent quality. This paper establishes factor reconstruction analysis forecasting and talent quality model on the basis of system reconstruction analysis and ensures most effective factor level in the system. The method is based on works of G.J. Klir, B. Jones, and performs dynamic analysis of example ration.

1. Introduction (Classification number of Chinese library: G316) As the amount of talent resource statistic data is very small, it is very difficult to perform forecasting by common data processing method. We use a system reconstruction factor analysis forecasting method proposed by G.F. Shu to forecast the talent requirement number, structure of ration dynamically (Lu and Shu, 2000; Shu, 2000). Factor reconstruction analysis method presents a simple and performable method in seeking main factor of question. It not only gives both the main factor level and the main factor level set, but also sorts all of the factors by importance degree, providing the rational number of important degree. In the study of talent requirement number forecasting, when we analyze the historical data, we consider the society environment and policy factor into analysis, and obtain better fitting. In future forecast, use of Tianjin data to compute is very conservative, but when we compose the data of advanced areas, such as: Beijing, Shanghai, Shenzhen, etc. we obtain rather satisfied result. 2. Establish model Compute procedure of factor reconstruction analysis (1) Sum up the behavior function maximum of all factors on each level of index, and sum up the behavior function minimum of all factors on each level of index Kybernetes Vol. 33 No. 5/6, 2004 pp. 992-996 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534038

max Fk; l ¼

N X n¼1

max edc* ðk; l k ; n; l n Þ

l n [Vfn

min Fk;l ¼

N X n¼1

min edc* ðk; l k ; n; l n Þ

where edc is the information entropy distance of behavior function, edc* ¼ 1=edc; Vfn is the set of all levels of factors. (2) Sum up the behavior function of all the factors on each level of forecasted period: N X Fk; l ðTÞ ¼ edc* ½k; l R ðTÞ; n; l n ðTÞ n¼1

(3) Estimate the possibility (Pk,lk(T )) of Fk, l (T ) of the forecasted period state, regard it as the weight to forecasting state. P k;lk ðTÞ ¼

System structure

l n [Vfn

Fk;lk ðTÞ 2 min Fk;lk Max Fk;lk 2 min Fk;lk

3. Application example Talent requirement forecast concerns about the development scale and speed of economy, direction and degree of industrial structure adjustment, proportion of hi-tech industrial, labor productivity level, talent development, society environment and policy. For increasing the scientific level of forecast result, we add the advanced area sample point into forecast model, making the designed model being able to reflect the internal regulation of talent development and break through the old structure, portray its development trend. 3.1 Establish target system By ways of specialist consultation, data relativity analysis and possibility of data collection, in talent forecast, we use one target, and seven factors which affect talent increase. (1) Factor . X1 ¼ total output value increasing rate (percent); . X2 ¼ proportion of first industrial output value on total domestic output value (percent); . X3 ¼ proportion of second industrial output value on total domestic output value (percent); . X4 ¼ proportion of third industrial output value on total domestic output value (percent); . X5 ¼ proportion of hi-tech output value (percent); . X6 ¼ year increasing rate of society labor productivity (percent); and . X7 ¼ society environment and policy impacts on talent development. (2) Target: Y ¼ talent increasing rate

993

K 33,5/6

994

3.2 Forecast of talent requirement value According to Tianjin Future Target Outline in 2001, Tianjin Talent Base and Statistic Yearbook, using factor reconstruction analysis software, we obtain the results, shown in Table I. Forecast talent increasing rate in 2002, different targets of three level F is: (1) Lob1 max F ¼ 8:9346 min F ¼ 0:0003 fn1 ¼ 0:7746; fn2 ¼ 0:5403; fn3 ¼ 0:835; fn4 ¼ 0:4393; fn5 ¼ 0:4944; fn6 ¼ 1:36664; fn7 ¼ 0:00005 (2) Lob1 max F ¼ 11:6109 min F ¼ 1:3342 fn1 ¼ 0:652; fn2 ¼ 1:1388; fn3 ¼ 0:9654; fn4 ¼ 0:5377; fn5 ¼ 0:5699; fn6 ¼ 0:3439; fn7 ¼ 1:6756

Year

Factor

Value (percent)

2002

X1

10

X2

5

X3

46

X4

49

X5

22

X6

9.8

X7

Better

X1

7.2

X2

4

X3

41

X4

55

X5

30

X6

9.8

X7

Better

2010

Table I.

Approach degree Fn1fl1 0.4286 Fn2fl1 0.3333 Fn3fl1 0.6 Fn4fl2 0.4375 Fn5fl2 0.4211 Fn5fl1 0.15 Fn7fl2 0.5 Fn1fl1 0.8286 Fn2fl1 0.5556 Fn3fl1 0.9333 Fn4fl2 0.0625 Fn5fl2 0.0 Fn6fl1 0.15 Fn7fl2 0.5

Fn1fl2 0.5714 Fn2fl2 0.6667 Fn3fl2 0.4 Fn4fl3 0.5625 Fn5fl3 0.5789 Fn6fl2 0.85 Fn7fl3 0.5 Fn1fl2 0.1714 Fn1fl2 0.4444 Fn1fl2 0.0667 Fn1fl2 0.9375 Fn1fl2 1.0 Fn1fl2 0.85 Fn1fl2 0.5

(3) Lob1 max F ¼ 8:511

System structure

min F ¼ 0:0003

fn1 ¼ 0:7035; fn2 ¼ 0:664; fn3 ¼ 0:3984; fn4 ¼ 0:8703; fn5 ¼ 0:8705; fn6 ¼ 1:1556; fn7 ¼ 0:9683

995 Forecast of total talent value is given in Table II. 3.3 Forecast of talent structure Talent structure mainly means proportion of three industrial constitutes; it is affected by the three industrial constitutes: development speeds, labor productivity, and society environment of talent development. So first analyze the development regulation of two industrial talent, and then forecast talent constitution according to the relative target of Tianjin development plan. According to forecast, three industrial talent structures are given in Table III. From the forecast result, we observe, along with the economy development, between 2003 and 2010, if the developing speed of the economy can keep high; one must add development power in hi-tech of second industrial. 3.4 Forecast of talent quality Treat talent education history index as quality target model Y t ¼ AðtÞðet K t Þa Lbt ðH t Qt Þg U t Among these, Yt expresses GDP; etKt, inner effect of policy; Lt, labor invest; Ht, talent invest; HtQt, talent quality; Ut, stochastic error; A(t), synthesise factor; and r, elastic of talent invest. Table IV shows the forecast of talent quality. We can use talent output elastic to reflect talent quality. A¼

Proposal Proposal 1 Proposal 2 Proposal 3

E 2m E 1m

21

Year

Talent increasing rate (percent)

Talent requirement value

2002 2005 2010 2005 2010 2005 2010

3.36 3.21 3.21 3.26 3.26 3.32 3.32

1,053,285 1,156,007 1,356,188 1,361,453 1,361,453 1,161,714 1,367,795

Table II.

K 33,5/6

996

Proposal Proposal 1 Proposal 2 Proposal 3

Table III.

Year 2002 2005 2010 2005 2010 2005 2010

Year

Table IV.

First industrial Total value (10,000 Proportion persons) (percent)

2002 t inspect value 2010 t inspect value

1.77 1.67 1.52 1.67 1.52 1.68 1.52

1.68 1.44 1.12 1.44 1.12 1.45 1.12

Industrial Second industrial Total value (10,000 Proportion persons) (percent) 42.44 44.59 48.43 44.75 48.88 44.85 48.19

Kt

Lt

Ht

0.534 23.988 0.549 20.12

2 0.0214 2 0.311 2 0.0187 2 0.243

0.4987 20.89 0.5474 16.98

Em ¼

40.29 38.51 35.71 38.51 35.90 38.61 35.96

Third industrial Total value (10,000 Proportion persons) (percent) 61.12 69.54 85.67 69.55 85.75 69.63 86.06

58.03 60.05 63.17 59.97 62.98 59.94 62.92

Fitting degree R 2

Standard deviation

F inspect value

0.9248

0.0245

1429.69

0.9372

0.0235

1534.18

Output changing rate Talent changing rate

Using it, we can obtain the result that talent quality increases 2.97 times averagely in 2005, which means the contribution rate of talent on economy, increases 2.97 times. Talent quality increases 3.19 times averagely from 2005 to 2010. References Lu, D. and Shu, G. (2000), “Application of system reconstructability analysis on research of relations between price and financial economic factors”, International Journal of General Systems, Vol. 1.29 No. 3, pp. 465-92. Shu, G. (2000), “Metasynthetic reconstruction and its applications in macroeconomics and other studies”, World Congress of Systems Science /ISSS 2000, July 2000, Toronto, Canada. Further reading Klir, G.J. (1976), “Identification of generative structures in empirical”, Int J. General Systems, Vol. 3 No. 2, pp. 89-104.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

A software architecture for reconstructability analysis Kenneth Willett and Martin Zwick Portland State University, Portland, Oregon, USA

A software architecture

997

Keywords Cybernetics, Information theory, Computer software Abstract Software packages for reconstructability analysis (RA), as well as for related log linear modeling, generally provide a fixed set of functions. Such packages are suitable for end-users applying RA in various domains, but do not provide a platform for research into the RA methods themselves. A new software system, Occam3, is being developed which is intended to address three goals which often conflict with one another to provide: a general and flexible infrastructure for experimentation with RA methods and algorithms; an easily-configured system allowing methods to be combined in novel ways, without requiring deep software expertise; and a system which can be easily utilized by domain researchers who are not computer specialists. Meeting these goals has led to an architecture which strictly separates functions into three layers: the core, which provides representation of data sets, relations, and models; the management layer, which provides extensible objects for development of new algorithms; and the script layer, which allows the other facilities to be combined in novel ways to address a particular domain analysis problem.

1. Introduction Reconstructability analysis (RA) is a technique based on set theory and information theory for the modeling and analysis of data sets involving discrete variables. It was derived from Ashby (1964), and was developed by Broekstra, Cavallo, Cellier, Conant, Jones, Klir, Krippendorff, and others (Klir, 1985, 1986, 1996, 2000; Krippendorff, 1986; Zwick, 2001a). It differs from other techniques in that it can identify high-dimensional multi-component relationships among the variables. RA encompasses a number of related methods and variations. . Information-theoretic(probabilistic) vs set-theoretic (crisp possibilistic) modeling. The information-theoretic approach deals with a frequency or probability distribution over the states of the system, while the set-theoretic approach deals only with the occurrence or non-occurrence of each possible state. These two approaches can be seen as two distinct methods within the framework of Generalized Information Theory ( Klir and Wierman, 1998), which also includes fuzzy methods (currently outside the scope of the software system described in this paper). Information-theoretic RA is mathematically equivalent to log-linear (LL) modeling (Bishop et al., 1978; Knoke and Burke, 1980), where the two overlap. Set-theoretic RA is related to techniques used in logic design and machine learning (Files and Perkowski, 1998; Perkowski et al., 1997). . Variable-based, latent variable-based and state-based modeling. Traditionally, RA has used a variable-based approach, where the

Kybernetes Vol. 33 No. 5/6, 2004 pp. 997-1008 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534047

K 33,5/6

998

.

constraints of the model are associated with sets of variables and the corresponding state subspaces. Latent variable-based modelling (VBM) (Hagenaars, 1993) takes a similar approach, but introduces new (unobserved) variables to represent relationships among the primary variables. State-based modeling (Zwick and Johnson, 2004) defines the constraints of the model (that is, the specifics of how the model must match the observed data) in terms of specific states of the data or its margins, rather than being defined by complete margins only. Thus, state-based modeling is a generalization of VBM. Directed vs neutral systems. In a directed system one or more of the variables is assumed to be dependent on others, and the problem is typically to find a reduced set of variables which adequately predicts the dependent variable(s). In neutral systems, all variables are treated as interdependent, and the problem is to identify the relationships among the variables which mostly account for the relation or joint probability distribution.

There exist several RA software packages, e.g. GSPS by Elias (1988), Klir (1976) and co-workers, Construct and Spectral by Krippendorff (1981), SAPS by Cellier and Yandell (1987) and Uyttenhove (1984), EDA by Conant (1988), Jones’ k-systems analysis ( Jones, 1989) and a recent program by Dobransky and Wierman (1995). However, no package fully encompasses the variations of RA discussed above. Some programs are not easily used by researchers outside the systems field; others do not incorporate statistical tests; only EDA can handle large number of variables. These packages are in limited use. A number of statistics packages (SAS, SPSS, Statistica, LEM) provide a basic LL capability which can be used to perform RA in a confirmatory mode, where a model can be selected and its fit to the data evaluated. However, the set of possible models for a data set of N variables increases hyper-exponentially with N. Because of this, and because RA is most useful for modeling complex (and therefore non-intuitive) relationships, a simple confirmatory approach has limited value. Instead, one requires the ability to rapidly search the space of all possible models, applying heuristics to narrow the search. A previously developed software package, Occam (Zwick, 2000), was developed at Portland State University to provide a range of RA capabilities. This package was derived from a set of predecessor programs, the first of which was written by Zwick in 1985. Under his direction, programs (Occam0) were written by Hosseini, Anderson, and Shu which improved these computations and performed additional RA functions, and several of these programs were then integrated by Daniels, who also introduced heuristic search (Occam1). Grygiel added new search procedures and an improved user interface (Occam2). This most recent version has been used to analyze medical, healthcare, satellite, linguistic, and other data, as part of a general research program in discrete multivariate modeling (Zwick, 2001b). A desire to increase

the accessibility of RA methods to a broader range of researchers, and also make these methods both more flexible and more automated, has led to the development of Occam3. 2. Key requirements for Occam3 Ideally, a tool such as Occam should address the needs for three types of researchers in the RA community. (1) Those developing variations on RA methods, as mentioned earlier. These researchers require the ability to extend the representation of core RA entities, such as data tables and model definitions. (2) Those developing new algorithms within a given RA framework, such as novel search heuristics, different statistical measures, etc. These researchers need to be able to extend the processing rules used by the system. (3) Those using RA for analysis in various domains. These researchers need a flexible analysis framework, which allows them to combine various methods according to the needs of their particular problems. Previous versions of Occam addressed the third type only; the system architecture was rigid and provided no simple means of extension. For this, a different architecture was needed. 3. Overview of RA Both information-theoretic and set-theoretic analyses begin with a table of sampled data, over a set of discrete variables (continuous variables can be discretized by binning). The cover is the set of variables in the data, and the data define a distribution (information-theoretic) or a set-theoretic relation over the system states defined by the Cartesian product of all the variables. Given these data, one is interested in evaluating simpler approximations of the data. In RA, such an approximation is defined by a structure, which is a set of relations. (From now on, the word relation will encompasss both distributions and set-theoretic relations.) Each relation is in turn a set of variables from the cover. For a four-variable problem with a cover{A,B,C,D}, an example of a relation is ABC and of a structure is ABC:BCD. Given a structure, a model is constructed by first producing a projection for each relation in the structure, and then reconstructing a fitted relation over the cover, which agrees with all the projections. For both set-theoretic and information-theoretic RA, one chooses the relation which maximizes uncertainty subject to the constraints of the projections. A structure may be loopless or may contain loops. For example, AB:BC:BD is loopless, while AB:BC:ACD contains one loop (i.e. A ! B ! C ! A). Loopless models are more efficient to process because fitted distributions can be determined algebraically, while distributions for models with loops must be

A software architecture

999

K 33,5/6

1000

fitted iteratively. (For set-theoretic relations, however, even models with loops have closed form solutions.) Structures (and their corresponding models) can be arranged in a lattice by definition of a parent-child relationship. Given a structure, each child structure is created by removal of a relation and reinsertion of all the embedded relations within that relation which are not already present in other relations of the structure. For example, the relation ABC has embedded relations AB, BC, and AC. Thus, the children of ABC:BD are AB:AC:BC:BD and ABC:D. The topmost structure (model) in the lattice is the saturated structure (model), defined by all the variables occurring in a single relation, e.g. ABCD. The bottom of the lattice is defined by the independence structure ( model ), which is defined only by first-order relations, e.g. A:B:C:D. To be more precise: this is the bottom of the lattice if one wishes to maintain the same cover in all structures, so that every structure contains every variable. A researcher using RA is typically interested in one or more of the following questions. (1) What are all the possible structures for a given cover? What are all the loopless structures, structures of non-overlapping relations, etc.? This might be a prelude to further exploration. (2) Which subset of variables in a directed system is most predictive of the dependent variable(s)? This problem is equivalent to bottom-up search of the sublattice of loopless models. (3) Which model(s), allowing loops, provide the simplest representation of the data, while still fitting the data with adequate accuracy? This question involves searching the lattice of models. (4) What are the characteristics of each of a specific set of models? The models may have been identified by some a priori criteria. Models can be fit to the data and then compared based on predictive power, differences from the data, chi-squared statistics, etc. Details of the model, such as the residual error in each specific state of the system, can also be computed. 4. Overview of Occam3 architecture The three requirements described earlier are addressed by separation of the architecture into three distinct layers. Each of these layers is extensible to address the needs of a particular constituency. The Occam core provides the representation of basic RA entities. The implementation of the core must be done with careful attention to performance, memory usage, and robustness. Implementation of this layer is done in C++. The Occam management layer provides the manipulation of core entities. This layer provides the basic mechanisms for searching, caching, computing statistics of interest, and reporting. This layer is implemented in C++ and uses class inheritance to provide extensibility.

Assembly of particular functions into an analysis strategy is done using the Occam Script Layer. There are standard scripts for common situations (such as detailed analysis of a single model, or simple search heuristics), but it is possible to create more complex scripts for special cases. The scripts are written in the Python language (Beazley, 1999), and an adapter interface between the Python interpreter and the rest of Occam3 is written in C++. Python was chosen because it is readily available on a wide variety of systems, including Windows and various forms of Unix. It is powerful enough to develop complex algorithms, but does not require mastery of programming details such as variable declarations and memory management. Even though Python is an interpreted language, most of the computation is done at lower levels of the system, so interpreter performance is not an issue. The specific approaches used for these three layers are described further in the following (Figure 1).

A software architecture

1001

5. The Occam core 5.1 Occam core objects The Occam core provides efficient implementations of three key objects: tables, relations, and models. These objects are described further in Figure 2. 5.1.1 Tables. A key goal of Occam3 is to be able to handle larger problems, with larger number of variables, than was possible with Occam2. (Occam2 has been used to analyze data involving 10s of variables.) The state space of a particular problem increases exponentially with the number of variables, while the amount of available data for analysis is likely to increase at a much slower

Figure 1. Architecture of Occam3

Figure 2. Objects in the Occam core

K 33,5/6

1002

rate. This means that, for large problems, the data are very sparse (not every feasible state is represented in the data sample). The core takes advantage of this sparseness by avoiding the need to represent the entire state space explicitly. A table, whether it represents the input data or a computed projection, consists of a sorted list of tuples. Each tuple contains a key, which encodes the states of the individual variables associated with that tuple; and a value which may be either Boolean (for set-theoretic analysis), a frequency, or a probability (depending on the type of analysis being performed). Tables are used in three contexts. A data table contains observation data, typically read from a data file. Data tables might also be produced by preprocessing data in some way, such as by binning quantitative variables, performing mask analysis on time series data, etc. A projection table is computed from a data table, and is associated with a particular relation by summing over the missing variables for probabilistic distributions or taking a logical “or” for crisp possibilistic relations. A fitted table is a computed table associated with a model, which defines a maximum uncertainty relation subject to the constraints of the model for all variables in the cover. Fitted tables for distributions are produced by algorithms such as iterative proportional fitting (IPF) (Krippendorf, 1981). Research into other, more efficient methods for fitting is being done as part of the overall Occam project (Zwick, 2004). For large analysis problems, the memory space requirements become dominated by storage of these data tables. Typically, the cardinality of individual variables is small (binary variables are common; most variables have only a few states); so bit-packing of the key is used to represent the tuple in the smallest possible number of bytes. Sorting the tuples allows binary searching to be used to find specific states, and allows operations on multiple tables to be performed in linear time. Sorting eliminates the need for any additional indexing structure for the data tables. Each variable has an additional state, “absent”, which is used in computing projection tables. For example, in a problem with binary variables A, B, and C, the tuples of the projection associated with AB are 00*, 01*, 10*, and 11*, where “*” represents the “absent”. Computation of a projection table over one or more variables is done in time which is linear with the amount of data, and storage of this table is typically much smaller than the initial data set due to combination of tuples. 5.1.2 Relations. A relation stores a list of variables defining the relation, an associated projection table produced from the initial data table, and a list of attributes, computed from the relation (e.g. degrees of freedom, uncertainty, etc.). Relations are reusable, because the projection table and relation attributes are independent of the model currently being evaluated. Also, the storage space

for a relation, and the high computational cost of computing the projection for that relation, makes this reuse a critical performance strategy. A cache of relations is maintained so that each relation is computed only once. 5.1.3 Models. An RA model is defined to be a set of relations; thus each model object contains references to its constituent relations, but not a separate copy. A model also contains a list of attributes, with statistics of interest computed for the model. A model may also have a fitted table in cases where this must be explicitly computed (see below). 6. The Occam management layer While the Occam core is concerned primarily with the representation of various data objects, analysis methods are implemented in the Occam management layer. Finding the best models, and computing and displaying statistics about models, involves three types of operations. (1) Navigation – this involves traversal of the lattice of all possible models, producing new candidate models from existing ones. (2) Reconstruction – for probabilistic systems, this involves construction of a fitted distribution for a given model. Reconstruction requires generation of a projection for each relation in the model, then the combination of those projections to produce the fitted distribution. For models without loops, the fitted distribution can be represented explicitly in terms of the projections of the individual relations, and key attributes such as uncertainty and degrees of freedom are computed algebraically. For models with loops, it is necessary to construct a fitted table, using an algorithm such as IPF. Reconstruction for crisp possibilistic systems is done using a closed form set-theoretic equation for models both with and without loops. (3) Evaluation – attributes of the model are computed and used to rank models for further processing. Attributes may be computed from the model structure (such as degrees of freedom), or may be computed from the fitted distribution (e.g. uncertainty, transmission). The management layer uses the entities of the core (tables, relations, and models) and basic operations on these entities (constructing projections, computing uncertainties, etc.) to build these higher-level RA methods. The foundation for building extensions in this layer (such as different heuristic search methods, of different evaluation criteria) is the Base Manager. This object provides facilities for caching and reusing relations and models. The base manager also provides basic operations such as construction of a fitted table for a model; computation of degrees of freedom (DF) and uncertainty (H) for models with overlapping components; and generation of all child relations of a given relation. These basic operations are used in a variety of RA methods.

A software architecture

1003

K 33,5/6

1004

More specialized managers can be constructed using class inheritance in C++. Such a manager will augment and/or replace operations provided by the base manager. In particular, the VBM Manager has been developed to provide support for classical RA methods. In the lattice of models, the basic operations are to move upward (find all parents of a given model); or move downward (find all children). It is also useful to be able to navigate within the sublattice defined by models meeting a certain criterion, such as those without loops. Restricting attention to loopless models allows much larger problems to be considered. For directed systems, loopless models correspond to models with a single predictive component. For example, the model ABC:AZ is loopless, while model ABC:AZ:BZ which has two predictive components contains a loop ðA ! B ! Z ! AÞ: Most model evaluation methods are based on some measure that combines the complexity of the model and the quality of its fit to the data. In the lattice of models, quality of fit is highest at the saturated model at the top of the lattice, which exactly fits the data. Quality of fit decreases monotonically as one navigates down through the lattice, with the fit being poorest with the independence model which assumes that all variables are independent. (This is true among models which use the full cover, i.e. the set of variables in the data. It is possible to analyze even simpler models, where variables are removed from the model entirely, ending with the uniform distribution model.) Quality of fit is typically measured either by the transmission between the model and the saturated model, or by the reduction in uncertainty of the dependent variables in the model relative to their uncertainty in the independence model. Model complexity, on the other hand, is lowest in the independence model, and increases monotonically upward. Information-theoretic complexity is typically measured by DF; similar measures are available for set-theoretic models (Grygiel, 2000). Since the set of possible navigation operators is open-ended, the VBM Manager has a facility for navigation “plug-ins”. At this point, plug-ins have been developed for full upward and downward search, as well as for loopless upward and downward search (loopless upward search is currently limited to directed systems). The VBM Manager has algorithms defined for about 40 relation and model attributes (uncertainty, transmission, standard and Pearson chi-square statistics, etc.). Additional attributes can be defined either as extensions to the management layer, or in the scripting layer. The result of the search process is typically a report of the best models encountered during search, sorted by some criterion (typically the same criterion which guided the search). Depending on the analysis problem, different statistics of the models may be of interest. The management layer contains a report generator, which can be provided with a list of models, sort

them according to a specified attribute, and display the results as a table. Tables can be generated in a textual format, a format compatible with desktop applications (e.g. spreadsheets), or in HTML. 7. The script layer One of the most serious limitations of previous versions of Occam was that minor variations in analysis technique required changes to the underlying software. The researcher was restricted to a fixed set of options, and adding additional options not only required rebuilding the software but also led to a proliferation of options and controls. In the planning for Occam3, it was clear that more flexibility and programming power in the user interface was required. Users who want to run analysis of a number of data sets, or preprocess the data in multiple ways before analysis, were interested in a way to automate this processing. For this reason, instead of being developed as a stand-alone application, Occam3 was developed as a loadable module which can be used in conjunction with the script language Python. The Python language was initially developed by Guido van Rossum at BWI in Amsterdam in 1990, and its continued development is performed by an Open Source development community. Python is becoming common in information system management applications as well as in web systems. Python was selected because of its simple syntax, its high level functions (such as automatic memory management with garbage collection, list processing, etc.) and the ease with which it can be used with existing C and C++ libraries. Interfacing a C++ library to Python involves development of an adapter file, which describes how C++ facilities are made visible through Python. Currently, the Occam adapter exposes four classes through python. (1) ocVBM Manager, which provides access to most of the computational and modeling functions, such as constructing models, performing search navigation operations, and computing statistics. (2) ocModel, which provides read access to details of a model, such as its attributes and relations. (3) ocRelation, which provides read access to the attributes and data of a single relation. (4) ocReport, which provides model sorting and reporting functions. Figure 3 shows a simple script for performing a complete loopless top-down search, and producing a report sorted on %dH. 8. Using Occam3 for large data sets A simple experiment was run using a subset of the OPUS data obtained from Dr Clyde Pope of the Kaiser Permanente Center for Health Research in

A software architecture

1005

K 33,5/6

1006

Figure 3. Script for performing a complete loopless top-down search and producing a report

Portland, Oregon in 2002. These data concern healthcare utilization in the Kaiser member population, and is a directed system with 24 independent and one dependent variables. These data have previously been analyzed with Occam2, to: (1) select a subset of the 24 variables, which is most predictive; and (2) build a model structure from that subset of variables which is statistically significant. With Occam2, the first step requires exhaustive bottom-up search. Since the number of single-component models increases exponentially with distance from the bottom of the lattice, the search for the best seven-variable model required more hours. The second step was able to take advantage of Occam2’s fixed width downward search, and was much faster. Even so, running more than a few experiments (e.g. changing the number of variables in the selected subset) was very expensive, requiring several hours. Also, the sequence of performing step 1, then step 2, required manual interaction. With Occam3 a simple script was written which defined the desired search strategy: a fixed-width single-predicting-component (i.e. loopless) upward search for the desired number of levels, followed by fixed-width downward search. All models not statistically significant were eliminated before the final report.

The upward search to select the best seven-variable component was faster by more than an order of magnitude, with a search-width of 30 (i.e. the 30 best two-variable components were kept, then the 30 best three-variable components, etc.) This heuristic approach identified a number of single-component models as good or better than the component found with Occam2. The scripting approach makes it simple to combine both the upward and downward search phases, and also allows experimentation with search width, number of levels, and different sorting criteria. References Ashby, W.R. (1964), “Constraint analysis of many-dimensional relations”, General Systems Yearbook, Vol. 9, pp. 99-105. Beazley, D. (1999), Python Essential Reference, New Riders Publishing. Bishop, Y.M., Feinberg, S.E. and Holland, P.W. (1978), Discrete Multivariate Analysis, MIT Press, Cambridge, MA. Cellier, F. and Yandell, D. (1987), “SAPS-II: a new implementation of the systems approach problem solver”, Int. J. General Systems, Vol. 13 No. 4, pp. 307-22. Conant, R.C. (1988), “Extended dependency analysis of large systems”, Int. J. General Systems, Vol. 14, pp. 97-141. Dobransky, M. and Wierman, M. (1995), “Genetic algorithms: a search technique applied to behavior analysis”, Int. J. General Systems, Vol. 24 No. 1/2, pp. 125-36. Elias, D. (1988), “The general systems problem solver: a framework for integrating systems methodologies”, PhD dissertation, Department of Systems Science, SUNY-Binghamton. Files, C. and Perkowski, M. (1998), “Multi-valued functional decomposition as a machine learning method”, Proc. ISMVL’98. Grygiel, S. (2000), “Decomposition of relations as a new approach to constructive induction in machine learning and data mining”, Electrical Engineering PhD Dissertation, PSU, Portland, OR. Hagenaars, J.A. (1993), Loglinear Models With Latent Variables, Quantitative Applications in the Social Sciences #94, Sage, Beverly Hills, CA. Jones, B. (1989), “A program for reconstructability analysis”, Int. J. General Systems, Vol. 15, pp. 199-205. Klir, G. (1976), “Identification of generative structures in empirical data”, Int. J. General Systems, Vol. 3 No. 2, pp. 89-104. Klir, G. (1985), The Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. (1986), “Reconstructability analysis: an offspring of Ashby’s constraint theory”, Systems Research, Vol. 3 No. 4, pp. 267-71. Klir, G. (Ed.) (1996), International Journal of General Systems, Vol. 24, Special Issue on GSPS. Klir, G. (Ed.) (2000), International Journal of General Systems, Vol. 29, Special Issue on Reconstructability Analysis in China. Klir, G. and Wierman, M. (1998), Uncertainty-based Information – Elements of Generalized Information Theory, Physica-Verlag, Heidelberg. Knoke, D. and Burke, P.J. (1980), Log-Linear Models, Quantitative Applications in the Social Sciences Monograph # 20, Sage, Beverly Hills, CA.

A software architecture

1007

K 33,5/6

1008

Krippendorff, K. (1981), “An algorithm for identifying structural models of multivariate data”, Int. J. General Systems, Vol. 7 No. 1, pp. 63-79. Krippendorf, K. (1986), Information Theory – Structural Models for Quantitative Data, Sage Series: Quantitative Applications in the Social Sciences, Series 07-062. Perkowski, M., Marek-Sadowska, M., Jozwiak, L., Luba, T., Grygiel, S., Nowicka, M., Malvi, R., Wang, Z. and Zhang, J. (1997), “Decomposition of many-valued relations”, Proc. ISMVL ’97, May 1997, Halifax, Nova Scotia, pp. 13-18. Uyttenhove, H.J.J. (1984), “SAPS – A software system for inductive modelling”, in: Oren, T.I., et al. (Eds), Simulation and Model-Based Methodologies: An Integrative View, NATO ASI Series, Vol. F10, Springer-Verlag, Berlin, Heidelberg, pp. 427-49. Zwick, M. (2000), “OCCAM: organizational complexity computation and modeling”, Portland State University Systems Science Program Internal Document. Zwick, M. (2001a), “Wholes and parts in general systems methodology”, in: Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY, pp. 237-56. Zwick, M. (2001b), “Discrete multivariate modelling”, available at: http://www.sysc.pdx.edu/ res_struct.html Zwick, M. (2004), “Reconstructability analysis with Fourier transforms”, Kybernetes, Vol. 33 Nos 5/6, pp. 1026-1040. Zwick, M. and Johnson, M. (2004), “State-based reconstructability analysis”, Kybernetes, Vol. 33 Nos 5/6, pp. 1041-1052.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Forecast entropy

Forecast entropy

W. Yao, C. Essex, P. Yu and M. Davison Department of Applied Mathematics, University of Western Ontario, London, Ontario, Canada Keywords Cybernetics, Statistical forecasting, Data analysis

1009

Abstract A technique, called forecast entropy, is proposed to measure the difficulty of forecasting data from an observed time series. When the series is chaotic, this technique can also determine the delay and embedding dimension used in reconstructing an attractor. An ideal random system is defined. An observed time series from the Lorenz system is used to show the results.

1. Introduction Experimental data contain information from an unknown system. When the system is chaotic, people often use a chaotic attractor modeled from the data to calculate the fractal dimension ( Frank et al., 1990), to control the unstable periodic orbits ( Yao, 1995), to study the topological structure (Takens, 1981), and to forecast data (Short, 1994). The typical technique used is to reconstruct an attractor from an observed time series s(t) with delayed coordinates ðsðtÞ; sðt þ tÞ; sðt þ 2tÞ; . . .; sðt þ ðd 2 1ÞtÞÞ; where t is the delay, and d the embedding dimension. Many publications attempt to determine the values of t and d so that the reconstructed attractor has some properties like an original one, such as no correlation between its coordinates, no cross of its orbits, and simplicity of its local structure. Kennel et al. (1992) determined d by studying the “noise” behavior of the neighbors at a point. They studied an observed finite time series from the Lorenz system and found that, in a properly reconstructed attractor, the “noise” is minimum. Their work suggested that d ¼ 4 for the series. Bumeliene et al. (1988), however, suggested that d ¼ 1 þ ½integer part of ðdA Þ where dA is the fractal dimension of the attractor. They used similar technique to that of Kennel et al., but considered the behavior of the neighborhood, not neighbors. For an observed time series from the Lorenz system, or some three-dimensional chaotic system, the result of Bumeliene et al. is perfect because dA [ (2, 3) for these systems, and d ¼ 3: But for some higher dimensional systems, the fractal dimension of their chaotic attractors may be still low, for example, dA [ (2, 3). In this case, the result of Bumeliene et al. is unsuitable. When the observed time series is noisy, these techniques have to be reconsidered. There are many other arguments in the literature for determining these values (Essex and Nerenberg, 1990; Ott et al., 1984). In this paper, we will introduce forecast entropy ( F ). The reconstructed attractor is considered from a dynamical viewpoint. When the attractor is best reconstructed, its F will be minimum.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1009-1015 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534056

K 33,5/6

1010

In Section 2, we will define an ideal random system. F of the random system is 1 which means the system is 100 percent unpredictable. In Section 3, we will discuss the calculation of F for chaotic attractors in tangent space. Then we will apply F to an observed time series from the Lorenz system to show the result in Section 4. Our conclusion will be given in Section 5. 2. An ideal random system Suppose there is a one-dimensional infinite time series {s(ti)} generated by a system, where t i ¼ iDt; i ¼ 1; 2; . . .; 1 and Dt is a constant. Denote the jth difference of {s(t i)} by {s ð jÞ ðti Þ} ¼ ðs ð j21Þ ðtiþ1 Þ 2 s ð j21Þ ðt i ÞÞ=Dt: Here, {s ð0Þ ðti Þ} ¼ {sðti Þ}: The system is an ideal random system if and only if for all j ¼ 0; 1; . . .; 1; the series {s ( j)(ti)} is uniformly distributed in the regime {[a ( j), b ( j)]}. Here, a ( j) and b ( j) are the largest and smallest values of {s ( j)(ti)}, respectively. For a finite time series with equal time intervals, one may only consider the distributions for some differences up to some finite degree. In this paper, we consider only the distribution of first difference. That is to say, we study the tangent space. 3. Forecast entropy and its calculation Let us define and calculate F for such an ideal random system in tangent space. For jth-order case, the only difference is that the calculation of Fj is based on the distribution of {s ( j)(ti)}. 3.1 Forecast entropy in one-dimensional case (d ¼ 1) Assume there is a one-dimensional time series {s(ti)}, i ¼ 1; 2; . . .; n distributed along a line (Figure 1). For simplicity and without loss of generality, suppose n ¼ 2m (the general case will be published elsewhere), where m is a positive integer. Following is our procedure to determine the forecast entropy of a one-dimensional ideal random system. Step 1. Cut the line at the middle, and count the number of points on the left and right half line, respectively. Denote the numbers as nL and nR, respectively. The location (left or right half of the line) of the next point may be predicted based on the probabilities pL ¼ nL =ðnL þ nR Þ and pR ¼ nR =ðnL þ nR Þ; respectively. At this scale, the difficulty of the prediction may be measured by the Shannon entropy (S )

Figure 1. Distribution of a one-dimensional ideal random system

S ¼ 2pL ln pL 2 pR ln pR :

ð1Þ Forecast entropy

For the ideal random system, nL ¼ nR ¼ n=2: Therefore, pL ¼ pR ¼ 1=2 and S 1 ¼ ln 2; where the subscript indicates first level. Step 2. Again, cut the left and right half line at their middles, and count the number of points on the new intervals as shown in Figure 1. Denote the resulting numbers by nLL, nLR, nRL and nRR. To predict whether the next point will be located in the regime LL or LR, one may use the probabilities pLL ¼ nLL =ðnLL þ nLR Þ and pLR ¼ nLR =ðnLL þ nLR Þ: The difficulty of the prediction can again be measured by S L ¼ 2pLL ln pLL 2 pLR ln pLR :

ð2Þ

Similarly, in the regime including RL and RR, the probabilities are pRL ¼ nRL =ðnRL þ nRR Þ and pRR ¼ nRR =ðnRL þ nRR Þ; respectively. The Shannon entropy in this regime is S R ¼ 2pRL ln pRL 2 pRR ln pRR :

ð3Þ

For the ideal random system, pLL ¼ pLR ¼ pRL ¼ pRR ¼ 1=2 and S L ¼ S R ¼ ln 2: We add, the total entropy at level 2 to obtain S2. S 2 ¼ S L þ S R ¼ 2 ln 2: Step 3. Repeat the process on shorter and shorter intervals to the mth level where, for an ideal random system, there is only one point in each sub-group. Obviously, for the ideal random system at the kth level, the total S k ¼ 2k21 ln 2; k , m: F is defined as F¼

m X

ak S k ;

ð4Þ

k¼1

where ak, k ¼ 1; 2; . . .; m are parameters dependent only on the number of points n. Therefore, F is the summation of weighted Ss up to mth level. For the ideal random system, one has m X F ¼ ln 2 ak 2k21 :

ð5Þ

k¼1

The process of determining these parameters is omitted here ( Yao, 2002). If we define F ¼ 1 for the ideal random system, for any distribution with the number of points n ¼ 2m $ 4; we have ! m X 1 2S 1 þ F ¼ mþ1 Sk : 2 ln 2 k¼2

1011

K 33,5/6

F [ [0, 1] because a distribution studied is always in comparison with the ideal random system which has the same number of points.

1012

3.2 Multiple-dimensional case (d $ 2) For simplicity, we may first calculate the F of the ideal random system along each coordinate separately (more discussion later), then the system’s F is defined as the average of these forecast entropies. 4. Forecast entropy of a time series from the Lorenz system For a given time series s(t) from a chaotic system, there may be three factors which affect F, the value of delay t, the dimension of the attractor, and the size of the neighborhood or the number of neighbors. To compare Fs of different time series, namely, of different chaotic attractors, the number of points in each series must be the same. Further, one needs to consider the number of orbits consisting of the N points. Every system has its own time scale. With the same step-size to numerically integrate different systems, some systems such as the Lorenz system oscillate very quickly, while some others such as the Ro¨ssler system oscillate slowly. If one uses the same sampling rate to obtain the time series, it may cause problem in calculating F. For example, for two N ¼ 1;024 time series, the first one oscillates so quickly that the series represents 100 orbits, and the other oscillates so slowly that it represents only one orbit. The information provided by the first series may be enough to describe the dynamical behavior of the system, but the second one may not. Therefore, one has to use different sampling rates for different systems so that the number of orbits consisting of the N points is roughly the same. In this paper, we take the sampling rate 0.01 for the Lorenz system, and N ¼ 16;384: To calculate F of a multiple dimensional attractor, we first project the attractor onto some plane constructed by the delayed coordinates, then calculate F of the distributions on these planes, finally take the average as F of the attractor. For example, to a k-dimensional attractor with coordinates ðx1 ; x2 ; . . .; xk Þ; we consider the planes (x1, x2), (x2, x3), . . ., (xk2 1, xk) and (xk, x1). To calculate F of the distribution on a plane, we first find n most neighboring points of point i in normal space, then calculate F of the local distribution of these neighbors in tangent space, that is, according to their angles. Point i experiences all the points of the time series. The average of these local Fs is taken as F of the distribution in the plane. To choose the value of n, we consider two cases. In some forecasting techniques, such as nonlinear dynamical forecasting (Short, 1994), a local mapping is used to describe the local structure. The value of n should be not less than the number of unknown parameters in the mapping. When the mapping is constructed by polynomials, the number of unknown parameters is

determined by the mapping’s dimension and the degree of polynomials. If one Forecast entropy uses up to second degree polynomials, when the dimension is 2, 3 and 4, the number of unknown parameters is 12, 30 and 60, respectively. Therefore, in the first case, we choose n ¼ 2dþ2 where d is the dimension. We want to investigate under which value of d, F reaches its lower bound so that one can use d-dimensional reconstructed attractor for optimal prediction. In the second 1013 case, n ¼ 16: We investigate the complexity of local structures of the attractor. The Lorenz system is three-dimensional. To calculate F of an observed time series from this system, we first use fourth order Runge-Kutta method to integrate it and obtain a time series sðtÞ ¼ xðtÞ: The step-size is 0.01. To reconstruct an attractor from the time series, one needs a delay t, which may be determined by calculating the auto-correlation function (Albano et al., 1988) or other techniques (Fraser, 1989). However, for the Lorenz system, these techniques cannot obtain a satisfactory t. Instead, one often uses t ¼ 0:09 to reconstruct the attractor because under this value the attractor looks more extended and like the original projected on (x, y) plane. Is this t also best from the predictability point of view? This question may be answered by calculating F. The time series s(t) is shown in Figure 2(a). The reconstructed attractor when t ¼ 0:09 is shown in Figure 2(b). F is calculated based on the distribution in the tangent space of the series. Case 1. The number of neighbors n ¼ 2dþ2 : In this case, we focus on the predictability based on the reconstructed attractors with different embedding dimensions. Figure 3(a) shows F as a function of t when d ¼ 2; 3; 4 and 5, respectively. Denote the forecast entropy when the dimension is d by Fd. The following information may be obtained from the figure.

Figure 2. (a) A piece of time series x(t) from the Lorenz system. (b) The reconstructed attractor from the series when delay t ¼ 0.09

K 33,5/6

1014 Figure 3. F vs t of the Lorenz system when d ¼ 2, 3, 4 and 5, respectively. (a) n ¼ 2d+2; and (b) n ¼16

(1) F2 oscillates around F3, and F 3 , F 4 , F 5 : This indicates that a three-dimensional embedding space is enough to reconstruct the attractor for prediction purposes. (2) There is a minimal F 2 ¼ 0:015 when t ¼ 0:09: This indicates that the delay t ¼ 0:09 is the best candidate to reconstruct the attractor when one uses delayed coordinates. Thus, F in tangent space has solved the problem of the value of delay, while the auto-correlation function in normal space cannot. (3) The minimal F 3 ¼ 0:024 appears at t ¼ 0:05; and minimum F 4 ¼ 0:044 at t ¼ 0:04: Therefore, in order to obtain the best forecast result, one should adjust the value of the delay as d changes. (4) The minimum F ¼ 0:015 when d ¼ 2 and t ¼ 0:09: This value represents the irreducible difficulty of predictions based on this attractor. Case 2. The number of neighbors is fixed at 16. We want to investigate the characteristics of the local structures of reconstructed attractors with the same number of neighbors. In this case, as shown in Figure 3(b), besides the similar F(t) to that in case 1, we find: (1) F 3 , F 2 ; which indicates that the local structure of the three-dimensional attractor is simpler than that of the two-dimensional one, and (2) Fd does not decrease further as d increases after d . 2: It may indicate that one cannot simplify the local structures by increasing the dimension of an attractor after d . 2: In other words, it is enough to use three dimensions to describe the system. The number of dimensions is exactly equal to the dynamical dimension of the Lorenz system. The result is reasonable:

a nonlinear coupled system cannot be uncoupled. The time series of any Forecast entropy variable contains the information of the others and the whole system. 5. Summary and conclusion We have introduced forecast entropy and applied it to a time series observed from the Lorenz system. By calculating the forecast entropy of the distribution of the series in tangent space, we obtained: (1) the dynamical dimension of the system which produces the time series; (2) the value of t which best reconstructs an attractor; (3) the dimension to do optimal prediction; and (4) the irreducible difficulty of predictions based on the attractor. The embedding dimension is exactly equal to the dynamical dimension. Therefore, our technique is useful for capturing information from an observed time series. More results of F in higher dimensional systems and random systems will be published elsewhere. References Albano, A.M., Muench, J., Schwartz, C., Mees, A.I. and Rapp, P.E. (1988), “Singular value decomposition and the Grassberge-Procaccia algorithm”, Phys. Rev. A, Vol. 38, pp. 3017-26. Bumeliene, S., Lasiene, G., Pyragas, K. and Cenys, A. (1988), Litov. Fiz. Sbornik, Vol. 28, p. 569. Essex, C. and Nerenberg, M.A.H. (1990), “Fractal dimension: limit capacity or Hausdorff dimension?”, Am. J. Phys., Vol. 58, pp. 986-8. Frank, G.W., Lookman, T., Nerenberg, M.A.H., Essex, C., Lemieux, J. and Blume, W. (1990), “Chaotic time series analysis of epileptic seizures”, Physica D, Vol. 46, pp. 427-38. Fraser, A.M. (1989), “Reconstructing attractors from scalar time series – a comparison of singular system and redundancy criteria”, Physica D, Vol. 34, p. 391. Kennel, M.B., Brown, R. and Abarbanel, H.D.I. (1992), “Determining embedding dimension for phase-space reconstruction using a geometrical construction”, Phys. Rev. A, Vol. 45, pp. 3403-11. Ott, E., Withers, W.D. and Yorke, J.A. (1984), “Is the dimension of chaotic attractors invariant under coordinate changes?”, J. Stat. Phys., Vol. 36, pp. 687-97. Short, K.M. (1994), “Signal extraction from chaotic communications”, Int. J. Bifurcation Chaos, Vol. 4, pp. 959-77. Takens, F. (1981), “Detecting strange attractors in turbulence”, in Rand, D.A. and Young, L-S. (Eds), Lecture Notes in Mathematics Vol. 898, Dynamical Systems and Turbulence, Warwick 1980, Springer-Verlag, Berlin, pp. 366-81. Yao, W. (1995), “Controlling chaos by ‘standard signals’”, Phys. Lett. A, Vol. 207, pp. 349-54. Yao, W. (2002), “Improving the security of chaos communications”, PhD thesis, University of Western Ontario, Ontario.

1015

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

1016

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

The forecast model of system reconstructability analysis Changyun Yu and Pengtao Wang Tianjin University of Technology, Tianjin, People’s Republic of China Keywords Cybernetics, Statistical forecasting, Data analysis Abstract In talent forecasting and optimization, talent structure, demand and quality are very important. In this paper, we analyze the use of factor reconstruction analysis and elastic coefficient forecast method in talent forecasting, and the model is built-up about talent structure, demand and quality of scientist and technicians for the electronic industry.

1. Forecast method We will use the factor reconstruction analysis (FRA) to research the law of talent demand development, because the statistic data about talent have the following characteristics. (1) The data are not standard. (2) The mechanism of talent running is changing, so the law of talent is complicated. The reconstruction analysis (RA) was founded by professor G.J. Klir in 1976, and FRA developed by B. Jones and G. Shu. FRA offers the simple and feasible method to seek the main factor level. It can not only find the main factor level, but also be arranged in important degree of all factor levels, and quantities the importance degree. We can forecast and appraise comprehensively on the basis of FRA. 2. Building model Figure 1 shows the FRA procedure. The sequence of forecasting on the basis of FRA is listed as follows. (1) Sum up the property function maximum of all factors on each level of index, and sum up the property function minimum of all factors on each level of index. max Fk;l ¼

N X n¼1

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1016-1019 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534065

min Fk;l ¼

N X n¼1

max edc* ðk; l k ; n; l n Þ

l n [Vf n

min edc* ðk; l k ; n; l n Þ

l n [Vf n

where edc is the information entropy distance of the behavior function,

edc* ¼ 1=edc; V fn is the collection of all levels of factors. (2) Sum up the behavior function values of the forecasting period states on each level of index Fk; l ðTÞ ¼

N X

The forecast model

1017

edc* ½k; l k ðTÞ; n; l n ðTÞ

n¼1

(3) Estimate the possibility ðpk; l k ðTÞÞ of Fk, l(T) within each index level, and regard it as the weight to forecast the index P k; l k ðTÞ ¼

Fk; l k ðTÞ 2 min Fk; l k max Fk; l k 2 min Fk; l k

W ðTÞ ¼ P 1;1 ðTÞ £ E 0 4 P 1;2 £ E 1 þ P 1;3 £ E 2 where E0 is the low level of index, E1 is the middle level of index, and E2 is the high level of index.

3. Tianjin Electronic Industry talent forecast The data source is the electronic industry statistic almanac (1993-1997). The model adopts one index W, that is the growing rate of the electronic talent in Tianjin, and three factors: V1 is the output increasing rate; V2 is the ratio output and profit-tax; and V3 is the surroundings. 3.1 Forecast index W during 1997-2000 According to the ninth five-year-plan of Tianjin Electronic Industry, during 1997-2000, V 1 ¼ 20 percent; V 2 ¼ 6:67 percent; V 3 ¼ favorable: We draw a conclusion from FRA.

Figure 1. Talent demand forecast procedure

K 33,5/6

Pðw ¼ 221 percentÞ ¼ 0:3868 Pðw ¼ 2 percentÞ ¼ 0:7234 Pðw ¼ 30 percentÞ ¼ 0:3877

1018

Therefore, W^ ¼ 221 percent £ 0:3868 þ 2 percent £ 0:7234 þ 30 percent £ 0:3877 ¼ 4:95 percent 3.2 Forecast index W during 2000-2010 According to the ninth five-year-plan of Tianjin Electronic Industry, during 2000-2010, V 1 ¼ 14:9 percent; V 2 ¼ 9:25 percent; V 3 ¼ favorable: We draw a conclusion from FRA. Pðw ¼ 221 percentÞ ¼ 0:4243 Pðw ¼ 2 percentÞ ¼ 0:4524 Pðw ¼ 30 percentÞ ¼ 0:3808 Therefore, W^ ¼ 221 percent £ 0:4243 þ 2 percent £ 0:4524 þ 30 percent £ 0:3808 ¼ 3:42 percent

Year

Table I.

Table II.

1996 1997-2000 2000-2010

Rate of GDP increase ( percent)

Rate of talent increase ( percent)

Talent elasticity

Talent quality

20.94 20 14.9

2 20.07 4.95 3.42

1.04 4.04 4.36

– 2.87 3.19

Year

Quantity of talent demand

Talent quality

1996 2000 2010

16,356 23,287 32,596

– 2.87 3.19

Talent structure Technology talent Management talent 8,676 12,120 16,965

7,720 11,167 15,631

3.3 The demand of Tianjin electronic talent forecast The talent demand of year 2000

The forecast model

20;145ð1 þ 4:95 percentÞ3 ¼ 23;287 The talent demand of year 2010 23;287ð1 þ 3:42 percentÞ10 ¼ 32;596 3.4 The talent quality forecast Talent quality can be reflected through talent contribution. But talent contribution is expressed by output elasticity of talent. It refers to the percentage increase in output for 1 percent increase in talent, while holding other factors constant. Thus, Em ¼

DGDP=GDP Dm=m

where DGDP=GDP is the increase rate of GDP, Dm=m is the talent increase rate. So, Em is the contribution of talent to economic growth. If talent quality is changeless for a period of time, Em should be constant. If Em has changed, talent quality should have varied. Suppose E 1m is the output elasticity of talent in period one the E 2m is then output elasticity of talent in period two. Thus, the percentage increase in talent quality is A¼

E 2m 2 E 1m E 1m

Table I gives the result of talent quality forecast. From Table I we conclude that talent quality enhanced 2.87 times up until year 2000, and 3.19 times up until year 2010. 4. Conclusions Table II gives the result of talent demand and quality forecast. Further reading Lin, D. (1998), “Demand analysis of IE talent meeting 21 century and training explore”, Engineering and Management, Vol. 3 No. 3. Shu, G. (1995), The Program of GENREC Generalization, China Academy of Sciences System Institute. Xianqi, W. (1998), “The talent system design and achievement”, Computer Applying, Vol. 17 No. 8. Xu, B. (1998), “Sci-tech talent forecast method having inspiration character”, Journal of Northeast University, Vol. 19 No. 4.

1019

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

1020

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Construction of main sequence of gene based on “method of factor reconstruction analysis” Zhang Zhihong and Wang Pengtao University of Tianjin Technology, Tianjin, People’s Republic of China

Liu Huaqing The Key Laboratory of Genetic Engineering, Fujian Academy of Agricultural Sciences, Fujian, People’s Republic of China

Shu Guangfu Institute of Systems Sciences, Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences, Beijing, People’s Republic of China Keywords Cybernetics, Numerical analysis, Genetics Abstract The decipher, analysis and research of gene sequence is the most challenging problem that scientists all over the world are trying their best to solve. This paper introduces the application of factor reconstruction analysis in this field, its process and steps in using this kind of method. The resulting computation and analysis is presented.

1. Introduction US scientists took the lead in bringing forward the Human Genome Project that illuminates Human Genome nucleotide sequence and interpretation of all inherited information of Humans early in 1985. The volatile data accompanied this project, and with it occurred the rapid development of computer technology and every kind of information technology. Scientists all over the world are seeking more and more methods of data processing and new algorithms to interpret every kind of gene sequence. This paper introduces the application of factor reconstruction analysis, for the reconstruction of the rice gene main sequence. This method has passed through computation, and analysis by specialists. 2. Method of factor reconstruction analysis Reconstructability analysis was brought forward by the distinguished Professor G.J. Klir, New York State University, USA in 1976. This conception was based on the constraint analysis posed by R. Ashby. Kybernetes Vol. 33 No. 5/6, 2004 pp. 1020-1025 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534074

The authors thank Professor Xue Yongbiao and his assistants of the Institute of Biological Inheritance and Growth of Chinese Academy of Sciences for providing gene data for this research.

In his new idea, Klir pointed out how to divide and distinguish or constitute Construction the local relations or the relationship between the local parts. It may easily of main sequence reflect the characteristics of the total. of gene Subsequently, this idea was brought forward by B. Jones, through his work later in the 1980s, made use of the method of system reconstruction to play a role in determining main substates. 1021 The method of factor reconstruction analysis was proposed by Shu Guangfu and his colleagues to compute the closeness between the behavior function of system hypothesis consisting every factor level and objective level and the behavior function of primary system. The nearer this closeness is, the more important is the degree of this factor level to the objective level. This is ranked according to each factor level to criterion level. The closest factor level is the most important one for the corresponding criterion level. 3. Constitute the main sequence using the method of “factor reconstruction analysis” There are several groups of rice gene sequences. The gene sequences are different due to their own different qualities. So we can find the main sequence in several groups of gene sequences, which will confirm the best sequence of rice quantity in this group. According to the principle of “factor reconstruction analysis”, we can constitute the main sequence of some groups. The main sequence shows the excellent sequence of that kind of rice. By using the ELEREC Program produced by Shu Guangfu the constitution steps are as follows. (1) The symbol definition in algorithm. In the computation of the gene sequence, according to the need of computation, the symbols of the four bases should be transferred into the form of digits. Here, we define A as 1, G as 2, T as 3 and C as 4. (2) Computation step. First, we must transform the gene sequence to digit form, and according to the symbol definition we transformed the gene sequence into digit sequence ranked in row. 2 3 changed into 3 1 3 · · Second, to use the computer program of factor reconstruction analysis, we transform the digit sequence that demonstrates the gene sequence ranked in the rows into digit sequence ranked in column as shown in Table I.

K 33,5/6

1022 Table I.

2 3 3 1 3 4 1 1 1 2 ...

2 1 1 4 1 2 4 2 4 4 2 4 4 1 4 2 2 3 3 2 ... ... ...

1 4 2 1 4 2 2 4 4 1 ...

1 1 4 2 2 4 4 1 2 3

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

Third, we input the digit sequence that demonstrates the gene sequence into the software of factor reconstruction analysis. As the capacity of the array is limited, when the gene sequence is too long we must divide the long sequence into several segments, then compute every segment, and finally assemble the results into a whole, thereby retaining the continuity of the computing result Table II. Finally, computation is done using the program of factor reconstruction analysis. The computation result is discussed in Section 4. Note. We decide on four factor levels and one index level in using the method of factor reconstruction analysis. When a group data is so large, we may use 20 factor levels and ten index levels.

4. Analysis of the result of computation From the computation result using the Method of Factor Reconstruction Analysis we can see the program sort on those variables, where the first block indicates that the approach extent with the first level, and the second block with the second level, . . .. . .. The closer to the variable, the larger the influence extent will be Table III.

Table II.

No. 1 2 3 4 5 6 7 8 9 10 11 12 ... ... ...

c1 c1 c1 c1 c1 c1 c1 c1 c1 c1 c1 c1

0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01

fn 2 fn 3 fn 7 fn 9 fn 10 fn 11 fn 14 fn 15 fn 16 fn 17 fn 18 fn 20

f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1 f1

2 3 2 4 4 2 4 3 1 2 1 4

No. First block 1 2 3 4 5 Second block 6 7 8 9 10 11 12 ... ... ... ... ... Third block 36 37 38 39 40 41 .. .. .. .. .. .. ...... nth block

c1 c1 c1 c1 c1

0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01 0.1492197E+01

fn 1 fn 11 fn 16 fn 19 fn 20

f1 f1 f1 f1 f1

1 2 3 3 1

c1 c1 c1 c1 c1 c1 c1

0.1678465E+01 0.1678465E+01 0.1678465E+01 0.1678465E+01 0.1678465E+01 0.1678465E+01 0.1678465E+01

fn fn fn fn fn fn fn

4 4 5 5 6 6 7

f1 f1 f1 f1 f1 f1 f1

1 2 2 2 2 4 2

c1 c1 c1 c1 c1 c1

0.1789745E+01 0.1789745E+01 0.1789745E+01 0.1789745E+01 0.1789745E+01 0.1789745E+01

fn fn fn fn fn fn

5 5 5 5 5 5

f1 f1 f1 f1 f1 f1

1 1 1 1 1 1

5. Construction of main gene sequence From these computation results, we may use these methods to construct the main gene sequence: . Fn 1 f1 1 of the first line indicates that the “1” of first line is the most important, so the first bit of main sequence selects “A”. . Fn 11 f1 2 of the 11th line indicates that the “2” of 11th line is the important, so the 11th bit of main sequence selects “G”. . Fn 16 f1 3 of the 16th line indicates that the “3” of 16th line is the important, so the 16th bit of main sequence selects “T”. . Fn 19 f1 3 of the 19th line indicates that the “3” of 19th line is the important, so the 19th bit of main sequence selects “T”. . Fn 20 f1 1 of the 20th line indicates that the “11” of 20th line is the important, so the 20th bit of main sequence selects “A”. .. . When all the most important gene bit are ranked in a line, it becomes the main sequence of several gene sequences.

Construction of main sequence of gene 1023

Table III.

K 33,5/6

1024

6. Biological effect of gene main sequence Biodiversity comes from gene polymorphisms which control different traits in a degree that is favorable for an organism of their own or humans. The interesting point is whether a fixed relationship exists between gene sequences and their controlling different traits. It will be important, in its theory and practice of reconstructing and using it in the analysis of genes if the method can be used to search for desirable gene sequences (main sequence) by analyzing the relativity between the gene sequence polymorphism and their controlled phenotype. Construction of the gene main sequence is defined as reconstructing and finding desirable sequence of an organism of their own or humans based on the method of factor reconstruction analysis. This idea originated from the thinking: the difference in gene phenotype generally results from gene sequence difference. Thus, the biologists and breeders were concerned as to whether the character of the gene sequence will be deduced based on the degree of gene phenotype, in reverse or, whether the degree of phenotype will be deduced by the character of the gene sequence, construction of main sequence referred, is just based on thinking. As the wide genome sequencing of the model organisms are completed and the data of the gene polymorphism are rapidly increased, the construction of the gene main sequence, especially, some genes of controlling concerned with agronomy traits, will facilitate breeders in plant breeding. So, the construction and use of gene main sequence will bring far-reaching influence on reconstruction of genes, as well as on plant breeding.

7. Conclusions Life computation is a new research branch of biological information. It is aimed at the speedup and to its strengthening of research in life science, the research and development of computer, databases, all kinds of algorithms and the application of software in life science. This paper introduces the method to construct the main sequence from the rice sequence using the method of reconstruction analysis, and concludes with a control level of rice gene sequence. The use of its software has achieved a better gene compositor structure by using computers.

Further reading Baxevanis, A.D. and Francis Onellette, B.F. (2000), Information of Biology: Applied Guide of Analyzing About Gene & Proteid, Translated into Chinese by Li, Y. and Zhirong, S. QingHua Publish Company, Beijing, p. 336. Feng, B. and Xie, X. (2000), Technology of Gene Engineering, Publish Company of Chemistry Industry, Beijing, p. 156.

Shu, G.F. (1993), “Some advance in system reconstructability analysis theory, methodology and applications”, Proceedings of the Second International Conference on Systems Science and Systems Engineering, 24-27 August, Beijing, China, International Academic Publishers, Beijing, China, pp. 326-31. Shu, G.F. (1997), “Reconstruction analysis methods for forecasing, risks, design, dynamical problems and applications”, Advances in System Science and Applications (1997) Special Issue, The 2nd Workshop of International Institute for General System Studied (1997), Inc., San Marcos, USA, pp. 72-9. Zhang, H. (1999), Conspectus of Gene Engineering, Publishing Company of HuaDong University, p. 505.

Construction of main sequence of gene 1025

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

K 33,5/6

1026

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Reconstructability analysis with Fourier transforms Martin Zwick Portland State University, Portland, Oregon, USA Keywords Cybernetics, Fourier transforms, Information theory, Modelling Abstract Fourier methods used in two- and three-dimensional image reconstruction can be used also in reconstructability analysis (RA). These methods maximize a variance-type measure instead of information-theoretic uncertainty, but the two measures are roughly collinear and the Fourier approach yields results close to that of standard RA. The Fourier method, however, does not require iterative calculations for models with loops. Moreover, the error in Fourier RA models can be assessed without actually generating the full probability distributions of the models; calculations scale with the size of the data rather than the state space. State-based modeling using the Fourier approach is also readily implemented. Fourier methods may thus enhance the power of RA for data analysis and data mining.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1026-1040 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534083

1. Introduction In reconstructability analysis (RA) applied to probabilistic systems, probability distributions for subsets of variables specified by a model are joined together to calculate a probability distribution for the full set of variables (Klir, 1985; Krippendorff, 1986). Similarly, in image reconstruction (IR) used in electron microscopy, tomography, and other areas, lower-dimensionality projections are combined to yield a full-dimensionality density function (Zwick and Zeitler, 1973). This paper will show that Fourier techniques used in IR can be applied also to RA (and thus to log-linear modeling (Bishop et al., 1978; Knoke and Burke, 1980) which closely resembles RA). There are important differences between IR and RA. IR treats continuous density functions defined on interval scale variables in two or three spatial dimensions. Projections arise from rotations of the object or of the imaging source, and are not mutually orthogonal. RA, by contrast, considers probability distributions, defined on a discrete, in fact nominal, domain of higher dimensionality, and the projections are all mutually orthogonal. Despite these differences, essentially the same task, namely composition of lower-dimensional projections to obtain a higher dimensional function, is accomplished in both areas. The Fourier method used in IR is as follows. Since the Fourier transform of a projection of a distribution is a central section (a section passing through the origin) of the transform of the function, measured projections can be combined by calculating their transforms, collecting these sections together in Fourier The author thanks Michael Johnson and Ken Wellett for their helpful comments on this paper.

space, and doing an inverse transform to obtain a function which has these projections. For a compact review of RA, see Zwick (2001). RA comprises two problem types: reconstruction and identification. Identification is the simpler of the two and closely resembles IR: the task is composition of a set of projections into a higher dimensionality distribution. This is done by the “iterative proportional fitting” (IPF) algorithm, in which projections are sequentially imposed on a calculated distribution initialized as uniform. Iterations of such impositions eventually converge on a distribution consistent with all projections. Actually IPF is needed only for models with loops, since for models without loops algebraic (non-iterative) solutions are available. But most models, and virtually all complex models, have loops. Reconstruction, however, is the problem most commonly encountered in RA and is the focus of this paper. The task here is to represent and approximate a distribution with a set of its lower dimensional projections. Reconstruction thus consists of three steps: (1) projection; (2) composition; and (3) evaluation. Projection yields the lower dimensionality distributions whose adequacy is being explored. Composition is done as in identification. Evaluation assesses the difference between the computed IPF distribution and the observed distribution. The projection and evaluation steps of RA do not pose serious computational problems, as they scale with the size of the data and not the state space. It is the composition step which poses the primary challenge, and this challenge is two-fold: (1) many iterations are sometimes needed for IPF to converge; and (2) IPF calculates probabilities for the entire state space, even when data are sparse. The computer time and space requirements of IPF restrict the applicability of RA by severely limiting the number and cardinalities of the variables which can be considered. What motivates this paper is the observation that in IR, composition is accomplished in a single iteration: one simply takes the Fourier transforms of all projections, collects together the resulting sections in frequency space, and performs the inverse Fourier transform. (Computations are done efficiently by using the fast Fourier transform.) If such a single-iteration method for composing projections were available in RA, it would enhance the power of RA for exploratory modeling. It will be shown below that this is indeed possible. Specifically, IR-type composition provides a single-iteration approximation to

Fourier transforms

1027

K 33,5/6

1028

IPF, which can be used for rough searches through the lattice of possible models. This addresses only the first of the two difficulties posed by IPF, since Fourier composition also involves the entire state space. It turns out, however, that back projection, a procedure equivalent to the Fourier approach, allows a “reduced” composition step to be done that calculates probabilities only for observed states. If the IR approximation to standard RA is adequate, exploratory modeling with the IR approach can thus bypass both the time and space limitations of IPF. The IR approach also allows the easy implementation of state-based modeling, a variant of RA pioneered by Jones (1985a, b) and currently under further development ( Johnson and Zwick, 2000; Zwick and Johnson, 2002). This implementation, however, still scales with the state space and not the data. 2. A simple 2D Fourier RA example To investigate whether the IR Fourier approach might be applied to RA, consider the 2D RA problem, for which there are only two possible models: AB, the “saturated” model (the data), and A:B, the “independence model”. The simplest case occurs where variables are dichotomous (binary) and the AB distribution is a 2 £ 2 contingency table, as shown in Table I(a). This table requires three parameter values for its specification, which is the degrees of freedom (df) of AB; this is suggested by the shading of three (arbitrarily chosen) cells. The independence model, A:B, is the distribution, which is the product of the margins of AB, as shown in Table I(b). Only two parameters (arbitrarily chosen and shown shaded), one in each projection, are needed to specify this model. The A:B distribution is the solution to the maximization of information-theoretic uncertainty subject to model constraints, i.e. to the problem: XX maximize U ¼ 2 qði; jÞ log qði; jÞ subject to

Table I. (a) Observed probability distribution, AB, and (b) independence model, A:B

X X qð0; jÞ ¼ pð0; †Þ ¼ 0:3 qð1; jÞ ¼ pð1; †Þ ¼ 0:7 X X XX qði; 0Þ ¼ pð†; 0Þ ¼ 0:4 qði; 1Þ ¼ pð†; 1Þ ¼ 0:6 qði; jÞ ¼ 1 P P where pð j; †Þ ¼ pðj; kÞ and pð†; kÞ ¼ pðj; kÞ; and where p and q refer to k

j

the observed (AB) and calculated (A:B) distributions, respectively. Although there are four projection equations, given the fifth equation which sets the sum of the probabilities to 1, there are only two linearly independent equations of constraint (one for each projection), hence dfðA : BÞ ¼ 2: As noted above, Table I(b) is the maximum uncertainty distribution subject to the constraints of the A:B model. It can also be generated directly as an algebraic function (here, the simple product) of the projections of the original distribution of Table I(a). In general, however, and specifically for models with loops, one cannot derive the model distribution algebraically, and the IPF algorithm must be used. The IR Fourier method is applied to this problem as follows. The discrete 2D Fourier transform of pð j; kÞ, is: XX PðJ ; KÞ ¼ pðj; kÞ exp½2piðjJ =N j þ kK=N k Þ j k where j ¼ 1; 2; . . .; N j and k ¼ 1; 2; . . .; N k and where ð J ; KÞ are the indices in Fourier space corresponding to ð j; kÞ. Extension to higher dimensions is straightforward. From the theorem that the Fourier transform of a projection is a central section, the projections, pð j; †Þ and pð†; kÞ have for their Fourier transforms the central sections Pð J ; 0Þ and Pð0; KÞ; respectively, as follows. Calculating central sections from projections X Pð J ; 0Þ ¼ pð j; †Þ exp½2pið jJ =N j Þ ð1Þ X Pð0; KÞ ¼ pð†; kÞ exp½2piðkK=N k Þ The Fourier transform of pð j; †Þ ¼ {0:3; 0:7} is {1:0; 2 0:4}; since Pð0; 0Þ ¼ 0:3 exp½2p ið0Þð0Þ=2 þ 0:7 ½exp 2p ið1Þð0Þ=2 ¼ 1:0 Pð1; 0Þ ¼ 0:3 exp½2p ið0Þð1Þ=2 þ 0:7 ½exp 2p ið1Þð1Þ=2 ¼ 20:4 and the transform of pð†; kÞ ¼ {0:4; 0:6} is Pð0; KÞ ¼ {1:0; 2 0:2}: If one collects together these two central sections, one has in Fourier space the transform shown in Table II (the central sections are shaded). In the Fourier approach to RA, after collecting together the central sections dictated by the model, one then does an inverse Fourier transform to obtain the q distribution corresponding to these projections, as follows:

Fourier transforms

1029

K 33,5/6

Inverse transform of set of central sections: XX qð j; kÞ ¼ ½PðJ ; 0Þ þ Pð0; KÞexp½22pið jJ =N j þ kK=N k Þ ð2Þ

2 Pð0; 0Þ

1030

It would be natural to presume that the inverse transform of Table II would yield the independence model distributionPP of Table I(b). The origin term, 1.0, of the Fourier distribution corresponds to pð j; kÞ ¼ 1: By itself, this term generates the uniform distribution. If one adds to this term the sections corresponding to the two projections, one might expect the result to be the maximally uniform distribution subject to the projections as constraints, i.e. Table I(b). This expectation is not correct. The inverse transform of Table II is Table I(a) and not Table I(b). Table II is actually the full transform of Table I(a), i.e. the non-central part of the transform of Table I(a) is 0: Table I(a), the AB distribution which in standard RA exhibits non-zero constraint is fully specified in the Fourier approach by its projections. By contrast, the Fourier transform of Table I(b), the independence model in standard RA, is shown in Table III. Its non-central Fourier coefficient is non-zero (0.08). Thus, Table II, the transform of a distribution with constraint seems to have df ¼ 2; while Table III, the transform of a distribution without constraint seems to have df ¼ 3: This anomaly is the result of the particular distribution chosen for analysis. This example was in fact chosen to highlight the differences between the Fourier approach and standard RA. Because the Fourier transform is a linear operation, the inverse transform of the collected sections yields, not the product of the projections as in standard RA, qð j; kÞstandard

RA

¼ pð j; †Þ pð†; kÞ

but a scaled sum of them, sometimes referred to as the operation of “back-projection” (BP)

Table II. Central Fourier section of Table I(a)

Table III. Fourier transform of Table I(b)

Back-projection: qð j; kÞFourier

RA

¼ pð j; †Þ=N j þ pð†; kÞ=N k 2 1=N j N k A projection

ð3Þ

Fourier transforms

B projection Origin correction

Back projection is equivalent to implementing equations (1) and (2) in one step by operating on the projections directly without actually doing any transform. Note that if data were sparse, this equation could be used to evaluate qð j; kÞ only for those ð j; kÞ which were actually observed; the entire state space of ð j; kÞ would not have to be generated. Applying BP to Table I(a) yields Table IV. By contrast, the inverse transform of Table III yields Table I(b) because of the extra contribution of the non-central fourth term (0.08) to the calculated distribution. This contribution,

1031

when added to Table I(a), gives Table I(b). Consider Table V, which has the same projections as Table I(a) and (b). Fourier reconstruction of Table V also yields Table I(a). The point of all this is that Table I(a) and not Table I(b) is the A:B “independence model” for the Fourier approach. The reason the Fourier method gives results different P from standard RA is that RA composition maximizes P 2 the uncertainty, 2 q logP q; the Fourier approach minimizes q ; or, equivalently maximizes, 2 q 2 : Table I(a) is the independence model in the P Fourier approach because it is the maximum 2 q 2 distribution for the {0:3; 0:7} and {0:4; 0:6} margins. Because of this, it is the reconstructed distribution when the data are Table I(a) or Table I(b) or Table V. Table IV. Fourier composition of Table I(a) from projections using BP (equation (3))

Table V. A second distribution with identical margins

K 33,5/6

1032

Figure P 1. (a, b) 2 q 2 P is roughly linear with 2 q log q. The measures are plotted for all 114 models for the Ries Smith data (top) and linguistic data (bottom)

P 2 The fact that the Fourier method minimizes q subject to the model constraints can be seen more directly P from the P fact that, for Q the Fourier 2 transform of q (and with proper scaling), jQj ¼ q 2 : The model constraints give Q the central sections of P, which are the transforms of the projections included in the model, with all other coefficients in Q being 0. Q thus embodies P 2 nothing beyond the model constraints and thus generates the Pminimum q : Although the Fourier approach P does not maximize 2 q log q it might collinear be usable for RA because the 2 q 2 it maximizes is roughly P with P uncertainty. This is shown in Figure 1 which plots 1 2 q 2 vs 2 q log q for all possible four-variable models for the Ries-Smith marketing data (Bishop et al., 1978) and some linguistic data currently under investigation (Zwick and McCall, 2004).

A linear relationship is evident for both data sets, although the relationship is obviously stronger for the first data set ðr ¼ 0:99Þ than for the second ðr ¼ 0:84Þ:The generality and the factors which affect the strength of this collinearity need to be investigated further, both empirically and analytically. P However, there are reasons to expect that this result is robust since 1 2 q 2 is used in economics and ecology as an alternative to uncertainty to quantify diversity. In economics, the sum-squared measure is known as the Herfindahl index ( Jacquemin and Berry, 1979). Intuitively, if the total probability of 1 is divided P into many small terms, the sum of their squares will be small, so 1 2 q 2 is a plausible (uncertainty). Because of the P measure of diversity P rough collinearity of 2 q log q and 2 q 2 ; maximizing the latter expression is likely to give results close to maximizing the former expression when both maximizations have the same constraints. So the Fourier approach has distinct promise for RA.

Fourier transforms

1033

3. Methodology of Fourier RA 3.1 Structure specification So far only a 2D example has been discussed. Consider the case of three variables, where the existence of structures (sets of relations) first arises, and where the lattice of relations and the lattice of structure are shown in Figures 2 and 3, respectively. (A “model” might be defined as a “structure” applied to data, but the distinction is subtle and this paper does not always insist on it.) The Fourier model of ABC is its full 3D transform. Next in the lattice of structures is AB:AC:BC. Its transform, Q, consists of sections, which are the transforms of the AB, AC, and BC projections. If the AB projection is written as pðA; BÞ and its Fourier transform as PðA; BÞ; then the central section

Figure 2. Lattice of relations for a three-variable system

Figure 3. Lattice of structures for three-variable system

K 33,5/6

QAB:AC:BC ðA; B; 0Þ ¼ PðA; BÞ contains the information on AB. Similarly, the sections QAB:AC:BC ðA; 0; CÞ ¼ PðA; CÞ and QAB:AC:BC ð0; B; CÞ ¼ PðB; CÞ contain the information on the AC and BC projections. Thus, QAB:AC:BC ðA; B; CÞ consists of a subset of coefficients from the full PðA; B; CÞ transform, namely those for which either A ¼ 0 or B ¼ 0 or C ¼ 0: Thus

1034

QAB:AC:BC ðA; B; CÞ ¼ {PðA; B; 0Þ} < {PðA; 0; CÞ} < {Pð0; B; CÞ}: One can list the Fourier components for a model in terms of a “dual,” which indicates which central sections are to be included. For example, the dual of AB:AC:BC is C:B:A, written in italics, which summarizes the condition that this model includes Fourier coefficients where C ¼ 0 or B ¼ 0 or A ¼ 0: (The colon in C:B:A means this inclusive “or”.) Applying this to the full Lattice of Structures yields Table VI. The table is read as follows: a coefficient P(A,B,C ) is included in a structure if the indicated zero condition defined by the dual holds for that coefficient. To show this in greater detail, Figure 4 represents PðJ ; K; LÞ; the Fourier transform of the ABC distribution, pðj; k; lÞ: Using the labels from this figure, the Fourier coefficients included in all structures are tabulated in Table VI. The degrees of freedom for these structures are also listed. Fourier reconstruction is additive in a way that conventional RA is not. A model is a set of relations. If relation 1 has coefficient set s1 and relation 2 has coefficient set s2, then a model including both relations has coefficient set s1 < s2 : By virtue of this additivity, models with loops do not require an iterative procedure to derive the calculated distribution. 3.2 Degrees of freedom Table VI suggests that the degrees of freedom of a Fourier model is the number of coefficients in all of the central sections defined by the model minus 1, to omit the origin term which corresponds to the sample size for frequency distributions or to the sum of probabilities being 1.

Level

Table VI. Conditions for inclusion of P( A,B,C ) coefficients in models

0 1 2.1 2.2 2.3 3.1 3.2 3.3 4.1 Note: A

Structure ABC AB:AC:BC AB:AC AB:BC AC:BC AB:C AC:B BC:A A:B:C variable in the

Dual F C:B:A C:B C:A B:A C:AB B:AC A:BC BC:AC:AB dual must be 0

df

0

1

Coefficients included 2 3 4 5

7 + + 6 + + 5 + + 5 + + 5 + + 4 + + 4 + 4 + + 3 + + for a coefficient to

+ + + +

+ + + + +

+ + + + + + +

+ + + +

7

+

+ + + +

+

+ + + + be included in the model. +

6

+

Fourier transforms

1035

Figure 4. Fourier coefficients as model parameters

Actually, the calculation of df is a little more complicated. Fourier coefficients are in general complex, which suggests that the real and imaginary parts should contribute two degrees of freedom for each coefficient. However, coefficients for transforms of real functions come in conjugate-symmetric pairs, where Pð J Þ ¼ P* ð2J Þ; where * means complex conjugate and J is in general a vector, so a pair of coefficients contributes two degrees of freedom. When J ¼ 2J coefficients occur in singletons, which because of conjugate symmetry must be real, again contributing one degree of freedom per coefficient. In Table VI, all coefficients are singletons, so the df calculation is trivial, but in general both pairs and singletons will occur. 3.3 Model error When the Fourier approach is used as an alternative RA framework, modeling any specific structure generates zero conditions from the dual of the structure. These allow one to construct Q, the transform of the model, which, inverse-transformed, yields P q. q can be assessed using either the standard P RA transmission, TðqÞ ¼ p log p=q or the sum-squared-error ðSSEÞ ¼ ð p 2 P qÞ2 ¼ ðP 2 QÞ2 ; an error measure more naturally associated with the Fourier P method. Only P the coefficients absent in the model generate error, i.e. SSE ¼ ðP 2 QÞ2 ¼ omitted in Q P 2. This means that SSE can be evaluated without generating q by taking the sum of squares of the omitted Fourier coefficients. A Fourier-based RA search, using SSE to evaluate models, does not need to inverse transform Q into q. Further, since contributions of missing coefficients to the error are mutually independent from one another, these errors do not have to be generated anew for every different model. Instead, errors can be calculated and stored for every relation in the lattice of relations. The error in any model can then be generated algebraically from these stored relation – SSEs. Refer again to Figure 4 where

K 33,5/6

the numbers 0-7 represent the Fourier coefficients, P. The model AB:AC:BC includes all coefficients where A ¼ 0 or B ¼ 0 or C ¼ 0; its error is thus represented above by “6”, i.e. Pð1; 1; 1Þ: This can be derived from the errors of the model’s component relations as follows: SSEðAB:AC:BCÞ ¼ SSEðABÞ þ SSEðACÞ þ SSEðBCÞ

1036 2 SSEðAÞ 2 SSEðBÞ 2 SSEðCÞ þ SSEðFÞ ¼ 2ð2 3 6 7Þ þ ðl 2 5 6Þ þ ð4 5 6 7Þ 2 ð1 2 3 5 6 7Þ 2 ð2 3 4 5 6 7Þ 2 ð1 2 4 5 6 7Þ þ ðl 2 3 4 5 6 7Þ ¼ 6 where F is the uniform distribution model, generated from only the origin coefficient of the transform. It seems likely (no proof is offered here), for relations in a model written as R1 ; R2 ; . . .; where Rj > Rk is the relation defined by variables common to Rj and Rk, that SSE can be written as follows: X SSEðRj Þ SSEðR1 : R2 : . . .Þ ¼ XX XXX 2 SSEðRj > Rk Þ þ SSEðRj > Rk > Rl Þ 2 . . . Model information computed from SSE transmission-defined information, as follows: I Fourier

RA

I standard

is

closely

related

to

¼ ½SSEðA : B : C : DÞ 2 SSEðmodelÞ=SSEðA : B : C : DÞ RA

¼ ½TðA : B : C : DÞ 2 TðmodelÞ=TðA : B : C : DÞ

This is shown in Figure 5 which is based on the data used for Figure 1(a). The figure shows, for every value of df, I standard RA (circles) and I Fourier RA (squares) for the highest information model using standard RA. The Fourier results approximate the standard results, especially at high information. This P 2plot is closely related to Figure 1, since T is linear with U and SSE with q : TðmodelÞ ¼ UðmodelÞ 2 U ðdataÞ X X X 2 2 P ¼ p 2 q 2 ðmodelÞ SSEðmodelÞ ¼ omitted in Q 3.4 Sparse data Defining parameters in Fourier space is an approach to RA model construction whose practicality depends on the data. In all the examples considered in this paper, the contingency table is full, in that each cell has a frequency greater than 1. This accords with the Chi-square rule of thumb that the sample size

should ideally be at least about five times the number of states. Where the data are sparse, however, Fourier transformation will spread the data throughout Fourier space. For example, the transform of a Gaussian is a Gaussian, and if the Gaussian gets narrower in distribution space, it gets broader in Fourier space. Thus, Fourier representation of sparse data is likely to have higher df, which may defeat the goal of compression. This issue is being investigated further. Possibly, wavelet, as opposed to Fourier, transforms might be an alternative approach to modeling sparse data, since for sparse data, the global character of Fourier transforms may be disadvantageous, while the local character of wavelet transforms may be useful. A wavelet approach to sparse data may need variable states to be relabeled and thus reordered to concentrate the data locally.

Fourier transforms

1037

3.5 Back projection as an alternative to IPF All this presumes that the Fourier coefficients, P, are the RA model parameters and that composition is done with equation (2). However, a “reduced” composition can be done with back-projection equation (3), which operates on distributions (not Fourier coefficients). If one wants only to screen models by evaluating T and thus needs only q values for observed states, this can be done with BP, which approximates IPF in a single iteration and, used for this purpose, scales with the data, not the state space. Note that equation (3) can yield negative q values. If the Fourier approach is used in RA only for model evaluations in exploratory searches, and not as a source of full q distributions, this may not be a problem; also, correctives are imaginable. Still, this possibility is one which requires further theoretical and computational exploration. 4. State-based modeling The Fourier components need not be restricted to central sections. One could choose, for a df ¼ n model, the n biggest Fourier coefficients from the original

Figure 5. Information of models using standard and Fourier RA (Ries Smith data) (A:B:C:D has df ¼ 5)

K 33,5/6

1038

Figure 6. Fourier transform variable- and state-based models, compared to standard variable-based models (Ries Smith data)

transform. This amounts to the Fourier equivalent of the “state-based” modeling approach (Johnson and Zwick, 2000; Zwick and Johnson, 2002) derived from the “k-systems analysis” of Jones (1985 a, b). In state-based, as opposed to variable-based, modeling, an RA model does not need to be defined in terms of complete projections (margins), but can instead be defined in terms of the probabilities of an arbitrary set of states (as long as the probabilities are linearly independent). Applying this notion to the Fourier approach to RA, models need not consist only of central sections but can be any set of Fourier coefficients. State-based modeling has a lattice of structures enormously greater than variable-based RA. This poses the problem of how to search this lattice. Jones (1985 c) proposed a path-dependent procedure: one selects the most information-rich state, the second most information-rich given the prior selection of the first state, and so on. Because of the path dependence of this algorithm, one cannot be sure that a state-based model, involving n states actually consists of the n most informative states. This uncertainty disappears in the Fourier approach to state-based modeling in which a model is defined, not from central sections but by selecting Fourier coefficients from anywhere in the transform in descending order of magnitude. For a model with df ¼ n; one simply selects the n biggest coefficients. The information content of such a state-based model will always be equal or superior to a df ¼ n variable-based RA model. This is shown in Figure 6 which shows, for every df, the I for standard variable-based RA for the highest information model (diamonds). On the same scale, I from transmissions of variable- and state-based models using Fourier transforms are also plotted. Clearly, I VB-ft < I VB-std but I SB-ft . I VB-std: While state-based modeling achieves greater compression than variable-based modeling, it has the disadvantage that Fourier coefficients are not as interpretable as a distribution and its projections. This is especially so

since the values of these coefficients depend upon the specific ordering of the states of the variables, but this ordering is arbitrary. The Fourier approach thus constitutes an alternative framework for doing state-based RA. State-based RA, like variable-based RA, can also use Fourier ideas just by using equation (3) of BP as an efficient approximation to IPF. 5. Summary This paper demonstrates that the Fourier approach of image reconstruction can P be applied to RA. AlthoughPFourier reconstruction maximizes 2 q 2 ; while standard RA maximizes 2 q log q; the two objective functions are roughly collinear. The Fourier approach can be used in a variety of ways. . This approach provides an alternative framework for RA. Projections can be collected in Fourier space and composition done in a single inverse transform. Calculated distributions can be evaluated with the standard T measure. . The search through the lattice of structures does not need to generate model distributions. If SSE is used instead of T, model error can be assessed without inverse transformation by computing the error resulting from the omitted transform coefficients. This can be generated directly from the set of relations included in the model. . One can use the Fourier approach more narrowly by simply replacing IPF with BP which is non-iterative and, when used for the evaluation of a model statistic like T, does not require operations on the whole state space. (Using BP to obtain a full q distribution, however, scales with the state space.) . State-based models, which can capture more information than variable-based models of the same df, are also easily implemented. There is no path-dependence in Fourier state-based modeling, and the best model can easily be selected for every df value. Alternatively, state-based modeling can be done in the usual way, but models might be fitted by BP, as an efficient approximation to IPF. Only a proof-of-concept of the Fourier approach to RA is provided here. Theoretical and computational issues are still being explored. This project is part of a larger effort in “discrete multivariate modeling” (Zwick, 2002), i.e. RA, which includes software development (Willett and Zwick, 2002) that will eventually encompass the Fourier approach. Work so far, however, has demonstrated clearly that RA can be approached with Fourier techniques. This is not surprising. Walsh, Haar, and other transforms are routinely used in logic design and machine learning, and methods in these fields overlap set-theoretic (crisp possibilistic) RA. Roughly speaking, the use proposed here of Fourier transforms in probabilistic RA parallels the use of the above discrete transforms in crisp possibilistic RA.

Fourier transforms

1039

K 33,5/6

1040

References Bishop, Y.M., Feinberg, S.E. and Holland, P.W. (1978), Discrete Multivariate Analysis, MIT Press, Cambridge, MA. Jacquemin, A.P. and Berry, C.H. (1979), “Entropy measure of diversification and corporate growth”, The Journal of Industrial Economics, Vol. 27 No. 4, pp. 359-69. Johnson, M.S. and Zwick, M. (2000), “State-based reconstructability modeling for decision analysis”, in Allen, J.K. and Wilby, J.M. (Eds), Proceedings of World Congress of the Systems Sciences 2000, Toronto. Jones, B. (1958a), “Reconstructability analysis for general functions”, International Journal of General Systems, Vol. 11, pp. 133-42. Jones, B. (1985b), “Reconstructability considerations with arbitrary data”, International Journal of General Systems, Vol. 11, pp. 143-51. Jones, B. (1985c), “A greedy algorithm for a generalization of the reconstruction problem”, International Journal of General Systems, Vol. 11, pp. 63-8. Klir, G. (1985), The Architecture of Systems Problem Solving, Plenum Press, New York, NY. Knoke, D. and Burke, P.J. (1980), Log-Linear Models, Quantitative Applications in the Social Sciences Monograph #20, Sage, Beverly Hills, CA. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Quantitative Applications in the Social Sciences #62, Sage, Beverly Hills, CA. Willett, K. and Zwick, W. (2002), “A software architecture for reconstructability analysis”, Proceedings of 12th International World Organization of Systems and Cybernetics and 4th International Institute for General Systems Studies Workshop, Pittsburgh. Zwick, M. (2001), “Wholes and parts in general systems methodology”, in Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY, pp. 237-56. Zwick, M. (2002), Discrete Multivariate Modeling, available at: www.sysc.pdx.edu/res_struct. html Zwick, M. and Johnson, M.S. (2002), “State-based reconstructability modeling”, Proceedings of 12th International World Organization of Systems and Cybernetics and 4th International Institute for General Systems Studies Workshop, Pittsburgh. Zwick, M. and McCall, M. (2004), “Reconstructability analysis of language structure” (in preparation). Zwick, M. and Zeitler, E. (1973), “Image reconstruction from projections”, Optik, Vol. 38, pp. 550-65. Further reading International Journal of General Systems (IJGS) Klir, G. (Ed.) (1996), Vol. 24, Special Issue on GSPS, pp. 1-2.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

State-based reconstructability analysis Martin Zwick and Michael S. Johnson Portland State University, Portland, Oregon, USA

State-based analysis

1041

Keywords Cybernetics, Modelling, Numerical analysis Abstract Reconstructability analysis (RA) is a method for detecting and analyzing the structure of multivariate categorical data. While Jones and his colleagues extended the original variable-based formulation of RA to encompass models defined in terms of system states, their focus was the analysis and approximation of real-valued functions. In this paper, we separate two ideas that Jones had merged together: the “g to k” transformation and state-based modeling. We relate the idea of state-based modeling to established variable-based RA concepts and methods, including structure lattices, search strategies, metrics of model quality, and the statistical evaluation of model fit for analyses based on sample data. We also discuss the interpretation of state-based modeling results for both neutral and directed systems, and address the practical question of how state-based approaches can be used in conjunction with established variable-based methods.

1. Introduction The focus of this paper is information-theoretic ( probabilistic) state-based modeling of systems defined by categorical multivariate data. In this context, a “system” is what Klir terms a “behavior system” ( Klir, 1985) – a contingency table that assigns frequencies or probabilities to system states. In a “neutral” system, no distinction is made between “independent” variables (IVs) and “dependent” variables ( DVs) or, equivalently, inputs and outputs. Such a distinction is made for “directed” systems, in which the IVs define the system state and the DVs depend upon this state. We consider both neutral and directed systems in this paper. Restricting our scope to systems comprising only qualitative (categorical or ordinal) variables is not as limiting as it might seem since continuous (intervalor ratio-scale) variables can be made qualitative by discretizing (clustering, “binning”), although discretizing does sacrifice some of the information in the original values of the quantitative variable. The concept of state-based modeling is central to Jones’ conception of “k-systems analysis” ( Jones, 1982, 1985a, c, 1986, 1989). Jones, however, linked the state-based modeling idea to the concept of a “g to k” transformation. This transformation maps a real-valued function of the system state defined by the values of a collection of categorical or discretized variables (a “g-system”) into an isomorphic dimensionless function that has the properties of a probability distribution (the “k-system”). The k-system, which “has properties sufficiently parallel to a probabilistic system that RA (reconstructability analysis) can be invoked” ( Jones, 1985c) is the starting point for Jones’s development of the

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1041-1052 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534092

K 33,5/6

1042

state-based modeling approach. Since in this paper our starting point is a behavior system, we detach state-based modeling from the “g to k” transformation concept and demonstrate that state-based modeling applies to both neutral and directed systems, and for directed systems also to those which are stochastic. Thus, Jones’ state-based modeling idea is an extension of the established variable-based RA framework ( Klir, 1985; Krippendorff, 1986). We define a model following Krippendorff: “A structural model consists of several components, each specified by a different parameter with respect to which it corresponds to the data to be modeled, and none is included or equivalent to another” (Krippendorff, 1986). A structural model implies a joint probability distribution of the same dimensionality as the data. The model (“q” distribution) is constructed by maximizing its information-theoretic 33 uncertainty (Shannon entropy) subject to the constraint that the explicit parameters in the model must match the corresponding values in the observed data (“p” distribution). There are many possible models of this sort for any given behavior system. The quality of a model can be assessed in terms of the degree to which the model accounts for the constraint in the data (fidelity) and the number of parameters (the degrees of freedom, df) required to specify the model (parsimony). As an example, consider the very simple two-variable neutral behavior system shown in Figure 1, where the probabilities in the figure are derived from a contingency table with a sample size of N ¼ 100: Three parameters are needed to specify AB since probabilities must sum to 1, hence dfðABÞ ¼ 3: The total constraint present in this system is the transmission   X p T¼ p log q between the p distribution which is the AB data (Figure 1) and the q distribution of the A:B model that assumes that A and B are independent (Figure 2). In a “top-down” perspective (going down from AB to A:B), T is the constraint lost in die independence model relative to the data. In a “bottom up”

Figure 1. p(AB), a neutral behavior system

Figure 2. The q distribution of A : B model

perspective (going up from A:B to AB), T is the constraint captured in the data relative to the independence model. The two perspectives are equivalent but have different emphases. Here, T ¼ 0:153: Only two parameters are needed to specify A:B, one from each margin; hence df(A : B) ¼ 2. The A : B model constrains the A and B marginal distributions to match those of the data, but is otherwise maximally uniform. As a result, q(A : B) does not match p(AB). T measures the difference between these distributions, and the statistical significance of T is assessed by Chi-square analysis. For the likelihood ratio Chi-square L 2 ¼ 2NT ¼ 21:27 and Ddf ¼ 1; and a significance level of a ¼ 0:05; an L 2 larger than 3.84 is required to reject the null hypothesis that the data AB and the independence model A:B are consistent with one another. In this example, clearly they are not. One cannot satisfactorily model the data with the independence model. Within the framework of variable-based RA, there are no other models to consider. In the state-based perspective pioneered by Jones, however, there are many possible additional models. In Section 2, we discuss the structure and specification of models for state-based RA. Then we assess how such models can address the competing objectives of fidelity and parsimony. Generally, we use the term “structure” to refer to a combination of parameters considered without reference to data, and the term “model” to refer to the actual parameter values of a structure when applied to specific data. 2. Exploring state-based structures In variable-based RA, parameters are values of complete marginal distributions (comprising one or more variables) that will, in the q distribution (the model), be constrained to match the corresponding marginal distributions derived from the p distribution (the data). In state-based RA, parameters do not specify complete marginal distributions ( projections). Rather, they correspond to any linearly independent set of individual elements (cells) of the joint distribution or any of the marginal distributions. For the AB system shown in Figure 1, there are eight candidate parameters: a0b0, a0b1, a1b0, a1b1, a0, a1, b0, and b1. The structure a0b0:a1, for example, constrains these elements in the joint distribution and the marginal A distribution to match their observed values. Like variable-based RA structures, state-based structures can be categorized with respect to the degrees of freedom required for specification. For the AB system, as indicated above, there are eight possible structures that utilize just a single df, one associated with each of the candidate parameters. There are 26 candidate structures that utilize two df, two less than the 28 possible two-parameter combinations (“eight choose two”). Two combinations (a0:a1 and b0:b1) are excluded because they are degenerate in the sense that, since the marginal distributions must sum to unity, the second parameter adds no additional constraint. There are 36 candidate structures that utilize three df; 20 of the 56 possible three-parameter combinations are ruled out due to

State-based analysis

1043

K 33,5/6

1044

degeneracy. The non-degenerate structures are summarized in Table I which also indicates the four state-based models equivalent to the variable-based model A:B. Clearly in this case and in general there are very many more state-based than variable-based models. For any particular df, structures can be further organized into equivalence classes where, for any given p distribution, all structures within the same equivalence class generate identical q distributions. Equivalence classes can in turn be grouped into general structures, which can be arrayed in a lattice; this is discussed below after the equivalence class idea has been explained. For the AB system shown in Figure 1, there are six equivalence classes in the df ¼ l category, and seven equivalence classes in the df ¼ 2 category (Table I). One of these equivalence classes corresponds to the A:B variable-based structure. All 36 non-degenerate three-parameter structures belong to the same equivalence class, since any df ¼ 3 structure will generate a q distribution that matches perfectly the p distribution (the data). Since marginal distributions are simply projections of the full joint distribution, any parameter of a state-based structure can be characterized as the sum of one or more elements of the p distribution. Specifically, any state-based structure can be described by an ðdf þ lÞ £ n matrix, S, where n is the number of elements in the p distribution and ðdf þ 1Þ # n. For a 2 £ 2 ðn ¼ 4Þ AB system such as Figure 1, the structure a0b0:a1 (for which df þ 1 ¼ 3) can be described by 0 1 1 0 0 0 B C C 0 0 1 1 S¼B @ A 1 1 1 1 where the columns of the matrix correspond to the elements of the p distribution: a0b0, a0b1, a1b0, and a1b1, respectively. The constraint imposed by a structure can then be summarized by the matrix equation S·q ¼ S·p

ð1Þ

For any given p distribution, the right-hand side of this equation is a known constant vector with cardinality df + 1. The last row in the S matrix is the same for all structures – it enforces the constraint that the elements of the q distribution must sum to one. The last element of the right-hand side vector of equation (1) is thus always one. For further discussion of this matrix formalism, see Anderson (1966). The structure matrix S can represent any state-based structure. In particular, if S is an n £ n matrix and all rows of S other than the last row are drawn without duplication from the n £ n identity matrix, then S will constrain the q distribution to match the p distribution exactly. (This is called the

General structure

Equivalence class

Structures (total structures ¼ 70)

3

f ¼ AB

1

36 structures a0b0 : a0b1 : a1b0 a0b0 : a0b1 : a1b1 a0b0 : a0b1 : b0 a0b0 : a0b1 : b1 a0b0 : a1b0 : a1b1 a0b0 : a1b0 : a0 a0b0 : a1b0 : a1 a0b0 : a1b1 : a0 a0b0 : a1b1 : a1 a0b0 : a1b1 : b0 a0b0.a1b1 : b1 a0b0 : a0 : b0 a0b0 : a0 : b1 a0b0 : a1 : b0 a0b0 : a1 : b1 a0b1 : a1b0 : a1b1 a0b1 : a1b0 : a0 a0b1 : a1b0 : a1 a0b1 : a1b0 : b0 a0b1 : a1b0 : b1 a0b1 : a1b1 : a0 a0b1 : a1b1 : a1 a0b1 : a0 : b0 a0b1 : a0 : b1 a0b1 : a1 : b0 a0b1 : a1 : b1 a1b0 : a1b1 : b0 a1b0 : a1b1 : b1 a1b0 : a0 : b0 a1b0 : a0 : b1 a1b0 : a1 : b0 a1b0 : a1 : b1 a1b1 : a0 : b0 a1b1 : a0 : b1 a1b1 : a1 : b0 a1b1 : a1 : b1 26 structures 2 g ¼ A:B 1 a0 : b0 a0 : b1 a1 : b0 a1 : b1 d 2 a0b0 : a0b1 a0b0 : a0 a0b0 : a1 a0b1 : a0 a0b1 : a1 d 3 a0b0 : a1b0 a0b0 : b0 a0b0 : b1 a1b0 : b0 a1b0 : b1 d 4 a0b1 : a1b1 a0b1 : b0 a0b1 : b1 a1b1 : b0 a1b1 : b1 d 5 a1b0 : a1b1 a1b0 : a0 a1b0 : a1 a1b1 : a0 a1b1 : a1 1 6 a0b0 : a1b1 1 7 a0b1 : a1b0 8 structures 1 a 1 a0 a1 a 2 b0 b1 b 3 a0ba0 b 4 a0b1 b 5 a1b0 b 6 a1b1 Notes: The variable-based A : B independence model is shown in bold (equivalence class 1 for df ¼ 2, general structure g). One could add to the bottom of the table the uniform distribution which has df ¼ 0.

df

State-based analysis

1045

Table I. Equivalence classes and general structures of state-based structures for the 2 £ 2 AB system

K 33,5/6

1046

“saturated” model.) While it provides a framework for specifying state-based structures, the structure matrix representation is actually more general, since it allows arbitrary combinations of cells that may not correspond to elements of any marginal distribution. The concept of structure degeneracy can also be formalized in terms of the structure matrix. If the rank of S is less than the number of rows in S, then the structure characterized by S is degenerate. The structure matrix also provides a mechanism for determining equivalence classes. A necessary condition for two state-based structures to be in the same equivalence class is that their structure matrices have the same rank. Given two state-based structures defined by the structure matrices S1 and S2, both having rank r, we can determine if the structures are in the same equivalence class by forming a combined structure matrix S12 that includes all the rows from both S1 and S2. If the rank of S12 also equals r, then the structures represented by S1 and S2 are in the same equivalence class. Two or more equivalence classes which are identical under swaps of (1) variable names, and/or (2) variable state names constitute a general structure. The general structures shown in Table I can be arrayed in the lattice shown in Figure 3. 3. Evaluating state-based models A state-based model of a behavior system encompasses two related ideas: given a p distribution and a candidate structure S, the q distribution is constrained to satisfy equation (1), and otherwise relaxed so as to maximize information-theoretic uncertainty. This can be achieved either through iterative proportional fitting or by using gradient-based optimization methods to maximize X H ðqÞ ¼ 2 q logq subject to the constraint (1) and the requirement that all elements of the q distribution be greater than or equal to zero. Because state-based structures exist that are less constrained than the variable-based independence structure A:B, this structure should not be taken

Figure 3. Lattice of general structures from Table I.

as the bottom of the Lattice of Structures. Since it maximizes uncertainty for any specified degrees of freedom, the uniform distribution is a more appropriate bottom model. Returning to this example of Figure 1, and using the uniform distribution as a reference model for calculations of information, the variable-based A:B independence model captures 78 percent of the information, I, in the data (Table II), where

State-based analysis

1047

I ðmodelÞ ¼ ðTðuniformÞ 2 TðmodelÞÞ=TðuniformÞ: Although its specification requires fewer parameters, the df ¼ l state-based model a1b1 does much better than the variable-based df ¼ 2 A:B with respect to information capture. The a1b1 model generates the q distribution shown in Figure 4. As indicated in Table II, the a1b1 model captures 99 percent of the information in the data, relative to the uniform reference model. Furthermore, L 2 ¼ 1:35 for this model, indicating no basis for rejecting the null hypothesis that the model is consistent with the data ð p ¼ 0:509Þ: p is the probability of making an error by rejecting the null hypothesis that q is the same as p. This example demonstrates that state-based models can, in principle, represent behavior systems more accurately and more parsimoniously than the best variable-based models. This example also illustrates some differences between our approach to state-based modeling and Jones’ k-system analysis. The original system (Figure 1) is a neutral system, with no quantitative system function. The g-to-k normalization of k-systems analysis cannot be applied to it, but the state-based idea can be applied. Also, the above analysis uses a top-down perspective that compares progressively simpler models to the data, while Jones’ k-systems analysis is cast in a strictly bottom-up framework. Finally, and critically, the statistical significance of a model is here assessed; this is not done for (and not appropriate to) k-systems approximations of real-valued functions.

AB (data) A:B a1b1 Uniform

T

Percent I

df

L2

P

– 0.153 0.010 0.710

100 78 99 0

3 2 1 0

– 21.27 1.35 98.50

1.000 0.000 0.509 0.000

Table II. Summary results for variable- and state-based models

Figure 4. The q distribution implied by the state-based a1b1 model

K 33,5/6

1048

Of course, state-based analysis is also applicable to directed systems, in which one or more variables are designated as “dependent” in that their values depend on other (independent) variables. Consider, for example, the directed – and stochastic – system of Figure 5 ðN ¼ 1; 247Þ; in which variables A and B are the independent variables and Z is the dependent variable. Note that this system is not deterministic (k-systems analysis is restricted to deterministic systems). For such systems, both variable- and state-based RA have natural interpretations in terms of the conditional probability distribution for the dependent variable, Z. At one extreme, the saturated model ABZ (the data) allows a different Z distribution for each of the four system states defined by A and B. Since we are interested only in the relationship between Z and the independent variables A and B, and not in any relationship among the independent variables, the appropriate bottom reference model is not the uniform distribution but the independence model, AB:Z, which asserts that the independent variables provide no information at all about Z. For this model, a single marginal Z distribution is assumed for all the system states defined by A and B. The degree to which AB:Z (or any other model) is consistent with the data ABZ can be assessed statistically, as described in Section 3. In the variable-based framework, between the extremes of ABZ, and AB:Z, there are three other candidate models: AB:AZ, AB:BZ, and AB:AZ:BZ. The model AB:AZ asserts that Z is related only to variable A; model AB:BZ has a similar interpretation. Model AB:AZ:BZ assumes that A and B both influence Z, but that there is no interaction between A and B with respect to their influence on Z. Table III gives results for all variable-based models and some state-based models for Figure 5, sorted by information, where I ðmodelÞ ¼ ðTðAB : ZÞ 2 TðmodelÞÞ=TðAB : ZÞ Although AB:BZ is the best variable-based model simpler than the data, it captures only about 17 percent of the information in the data while utilizing

Figure 5. p(ABZ), a directed behavior system

Model ABZ AB : Z : a0Z AB : Z : a0b1Z AB : Z : a0b0Z AB : AZ : BZ AB : BZ AB : Z : a1b1Z AB : Z : a1b0Z AB : AZ AB : Z

T

Percent I

df

L2

– 0.0002 0.0696 0.0876 0.1478 0.1482 0.1610 0.1720 0.1777 0.1780

100 100 61 51 17 17 10 3 0 0

7 6 5 5 6 5 5 5 5 4

– 0.3 120.3 151.4 255.5 256.2 278.4 297.4 307.2 307.6

p

State-based analysis

1.000 0.603 0.000 0.000 1049 0.000 0.000 0.000 Table III. 0.000 Summary results for 0.000 0.000 directed systems models

nearly as many degrees of freedom ðdf ¼ 5Þ as existing in the data ðdf ¼ 7Þ: Moreover, the AB:BZ model is not statistically consistent with the observed data ( p ¼ 0:000; i.e. there is no chance of error if we assert that the model differs from the data). As was the case for the neutral system described above, state-based models for this directed system can capture more of the information in the data using the same or fewer degrees of freedom. The model AB:Z:a0b1Z, for example, specifies that the conditional distribution for Z must match the observed distribution for the a0b1 system state, and that a single Z distribution will be used for all other system states. This model has df ¼ 5 just as the variable-based AB:BZ model does, but the AB:Z:a0b1Z model captures 61 percent of the information in the data. It is still, however, inconsistent with the observed data ð p ¼ 0:000Þ: The AB:Z:a0BZ state-based model, however, is simpler than the data (df ¼ 6; Ddf ¼ 1), captures essentially all of the information in the data, and is statistically indistinguishable from the data ð p ¼ 0:6029Þ given the sample size ðN ¼ 1; 247Þ: The AB:Z:a0BZ model specifies that, when the system is in state a0, the joint BZ distribution will match the observed data. Otherwise, the probabilities for the model distribution (q) will be maximally relaxed, consistent with the AB and Z margins. It is worth noting that state-based models for directed systems also can specify partial agreement with conditional distributions for dependent variables. For instance, the model AB:Z:a0b1z0 would require that the calculated probability q(a0b1z0) and conditional probability q(z0ja0b1) match their observed values. Of course, models of this sort are applicable only when the associated dependent variable has more than two states. The statistical analyses of Figures 1 and 5 used a top-down approach. L 2 and Ddf could also be calculated relative to the independence model, rather than the data. In this case, a very low p would mean that ascent to the model is statistically justified. This bottom-up approach is especially natural for directed systems.

K 33,5/6

1050

4. Searching the state-based structure lattice Unfortunately, the benefits of state-based modeling are coupled with an enormous increase in the number of models that must be considered. As indicated above, an AB system has just one alternative variable-based model (A:B) but, if the variables are binary, there are 70 nondegenerate state-based models. Even after models have been grouped into equivalence classes, and a canonical model from each class chosen, there are 14 models whose distributions need to be generated. For variable-based modeling, variable cardinalities do not affect the lattice of structures, but for state-based modeling, the number of state-based structures increases not only with the number of variables in the system, but also with the cardinality of the variables. For example, a two-variable AB system in which just one of the variables has three states rather than two still has only one other variable-based model (A:B), but this system has 11 candidate state-based parameters and 1,023 possible parameter combinations that utilize five or fewer df. Even after rejecting degenerate structures, 568 distinct structures that can be grouped into 129 equivalence classes remain to be evaluated. While an exhaustive search might be feasible for very simple systems involving only a few variables and a small number of states per variable, a different approach is clearly required for more complex behavior systems. Jones (1985b) proposed a “greedy algorithm” that determined the best one-parameter model, then used that as the starting point for evaluating two-parameter models, and so on. The algorithm works well in practice but does not guarantee the optimality of the final model. An obvious extension of Jones’ greedy algorithm is to prune less heavily at each step, retaining two or more candidate models as a starting point for searching at the next level of complexity (i.e. utilizing more df). A very different approach to searching the state-based Lattice of Structures using Fourier transforms is sketched by Zwick (2002). When the state-based modeling approach is viewed as an extension to variable-based modeling, an obvious search strategy is to identify the best variable-based model and use that as a starting point for evaluating candidate state-based models. Since every variable-based model can be specified from the state-based perspective, it should be possible, in principle, to start with the best variable-based model and determine if adding an additional state-based parameter can efficiently improve the model’s conformance with the data. Alternatively, it may be possible to remove a parameter and reduce the model’s complexity without sacrificing too much fidelity. 5. Conclusions and future directions The investigations described in this paper build on the work of Jones and his colleagues in order to establish state-based RA as a natural extension of accepted variable-based RA methods. Results to date demonstrated that:

(1) State-based RA can be used where the k-systems framework is inapplicable, e.g. to analyze distributions: . where there are multiple interrelated quantitative dependent variables, . where dependent variables are categorical, . where systems are neutral, or . where systems are stochastic. (2) The reference model for state-based RA is not limited to the uniform distribution. For directed systems, the variable-based independence model may provide a more appropriate reference. Also, the bottom up approach using either of these reference models can be replaced by a top-down approach using the saturated model as the reference model (this might be especially appropriate for neutral systems). (3) The lattice of structures for state-based models is related closely to the variable-based lattice. Equivalence classes can be established with matrix methods. (4) Searching the state-based lattice of structures can be used to further improve the results of searching the variable-based lattice. (5) Methods previously applied in variable-based RA for evaluating the statistical significance of differences between models apply equally to state-based RA. g-to-k-normalization, which converts a quantitative system function to a probability distribution, does not provide for (or require) such statistical assessment. (6) State-based modeling can be used to enhance decision analysis. This is not discussed in this paper, but see Johnson and Zwick (2000). Explorations reported in this paper were done mostly with spreadsheets, but the discrete multivariate modeling (DMM) group (Zwick, 2001b) at Portland State University is developing a comprehensive software platform (OCCAM) for reconstructability analysis ( RA) (Willett and Zwick, 2002) which will support state-based analysis. For a review of RA including state- and latent variable-based modeling, see Zwick (2000a). RA overlaps very considerably with log-linear (LL) modeling, which is widely used in the social sciences (Bishop et al., 1978; Knoke and Burke, 1980), so state-based modeling is an important extension of LL modeling as well. For recent work in RA which makes extensive use of Jones’ k-systems framework, see Klir (2000). References Anderson, D.R. (1996), “The identification problem of reconstructability analysis: a general method for estimation and optimal resolution of local inconsistency”, PhD dissertation, Portland State University, Portland, OR.

State-based analysis

1051

K 33,5/6

1052

Bishop, Y.M., Feinberg, S.E. and Holland, P.W. (1978), Discrete Multivariate Analysis, MIT Press, Cambridge, MA. Johnson, M. and Zwick, M. (2000), “State-based reconstructability modeling for decision analysis”, in Allen, J.K. and Wilby, J.M. (Eds), Proceedings of The World Congress of the Systems Sciences and ISSS 2000, International Society for the Systems Sciences, Toronto, Canada. Jones, B. (1982), “Determination of reconstruction families”, International Journal of General Systems, Vol. 8, pp. 225-8. Jones, B. (1985a), “Determination of unbiased reconstructions”, International Journal of General Systems, Vol. 10, pp. 169-76. Jones, B. (1985b), “A greedy algorithm for a generalization of the reconstruction problem”, International Journal of General Systems, Vol. 11, pp. 63-8. Jones, B. (1985c), “Reconstructability analysis for general functions”, International Journal of General Systems, Vol. 11, pp. 133-42. Jones, B. (1986), “K-systems versus classical multivariate systems”, International Journal of General Systems, Vol. 12, pp. 1-6. Jones, B. (1989), “A program for reconstructability analysis”, International Journal of General Systems, Vol. 15, pp. 199-205. Klir, G.J. (1985), Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. (Ed.) (2000), International Journal of General Systems, Special Issue on Reconstructability Analysis in China, Vol. 29. Knoke, D. and Burke, P.J. (1980), Log-Linear Models, Quantitative Applications in the Social Sciences Monograph #20, Sage, Beverly Hills, CA. Krippendorff, K. (1986), Information Theory: Structural Models for Qualitative Data, Sage, Newbury Park, CA. Willett, K. and Zwick, M. (2002), “A software architecture for reconstructability analysis”, Proceedings of l2th International World Organization of Systems and Cybernetics and 4th International Institute for General Systems Studies Workshop, Pittsburgh. Zwick, M. (2001a), “Wholes and parts in general systems methodology”, in Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY, pp. 237-56. Zwick, M. (2001b), Discrete Multivariate Modeling, available at: www.sysc.pdx.edu/res_struct. html Zwick, M. (2002), “Reconstructability analysis with Fourier transforms”, Proceedings of 12th International World Organization of Systems and Cybernetics and 4th International Institute for General Systems Studies Workshop, Pittsburgh.

The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister

The current issue and full text archive of this journal is available at www.emeraldinsight.com/0368-492X.htm

Reconstructability analysis detection of optimal gene order in genetic algorithms

Optimal gene order

1053

Martin Zwick Portland State University, Portland, Oregon, USA

Stephen Shervais College of Business and Public Administration, Eastern Washington University, Cheney, Washington, USA Keywords Cybernetics, Programming and algorithm theory, Optimization techniques Abstract The building block hypothesis implies that genetic algorithm efficiency will be improved if sets of genes that improve fitness through epistatic interaction are near to one another on the chromosome. We demonstrate this effect with a simple problem, and show that information-theoretic reconstructability analysis can be used to decide on optimal gene ordering.

1. Introduction Holland’s schema theorem and the building block hypothesis suggest that the performance of a genetic algorithm (GA) might be improved if genes exhibiting epistasis, i.e. genes having a strong interaction effect in their effect on fitness, are near to one another on the chromosome. Genes that are close are less likely to be separated by the crossover operator, and alleles that have high fitness can constitute a building block for further evolution. Epistasis may thus imply the existence of an optimal gene order for a GA. This suggests two questions. First, can it be shown that GAs work better if epistatically linked genes are close to one another? Second, if such an effect exists, is it possible to extract information from data produced by the GA, so that we can modify the gene order (using a new genetic operator) to improve GA performance? In this paper, after describing the schema and building block hypotheses and their relevance to epistasis, we demonstrate the existence of a gene order effect in a very simple problem (Section 2). We then show that the methodology of reconstructability analysis can be used to discover preferred gene orders even from GA data produced by less preferred gene orders (Section 3). Finally, we discuss the results of these preliminary experiments and point to areas for future exploration (Section 4). This paper is based on a paper presented at the 12th International World Organization of Systems and Cybernetics and 4th International Institute for General Systems Studies Workshop, Pittsburgh, 2002.

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1053-1062 q Emerald Group Publishing Limited 0368-492X DOI 10.1108/03684920410534100

K 33,5/6

1054

2. Schema theorem, building blocks, and epistasis The schema theorem was first proposed by Holland (1975) as a description of how adaptive systems “persistently test and incorporate structural properties associated with better performance (p. 66).” Although there is now some doubt as to how well it describes the dynamics of the GA search process (Mitchell, 1996; Thornton, 1997), it is still useful as a conceptual device, and we use it that way here. According to the theorem, GAs work by parallel testing of multiple combinations of bit strings made up of the available alleles. In the typical binary chromosome, the alleles may be represented as 1, 0, and * (do not care). Thus, 110***11 is a schema (call it S1) of defining length eight and cardinality five. Note that S1 contains a shorter schema (S2), 110*****, with a defining length and cardinality of three, and a third schema (S3), ******11. In fact, a schema of defining length eight has 38 possible schemata embedded in it, but we shall here discuss just these three. If strings containing S2 have a higher-than-average fitness, then they will be preferentially selected, and S2 will act as a building block that can be assembled with other building blocks to create longer schemata and higher fitness bitstrings. Since the ratio of the defining length to the cardinality is low, S2 is not likely to be broken up by the crossover operator. The same argument applies to S3. Now, consider S1. If bitstrings containing this schema have a higher-than-average fitness, they will be preferentially selected as well. However, since the defining length of S1 is large relative to its cardinality, it also stands a higher chance of being broken up during crossover. If S2 and S3 are both important to the fitness of S1, we would be better off changing the representation so that S2 and S3 are close together. In other words, if S1 has high fitness, it would be more likely to survive recombination if we had some good reason to move the 11 alleles over to be adjacent to the 110 alleles, i.e. to recode the genome so that this schema was 11011***. Although the usefulness of short building blocks has long been understood, only a few researchers have addressed the issue of how changing gene order might facilitate reaching enhanced fitness. McClintock (1987) is credited with discovering the importance of gene transposition in nature. This established transposition as a possible genetic operator is available for use by GA researchers. Goldberg et al. (1993) developed the “fast messy” GA, which, among other things, allows the GA to evolve gene locations on the chromosome. They did this by coding stretches of the chromosome with a gene identifier, which specified the gene that part of the chromosome is represented. A given gene might start out overexpressed in a chromosome, because its identification code appears at two different locations. The program selects the first instance of the gene and ignores the rest. Alternatively, a gene might be underexpressed if it does not appear in the bitstring at all. The program then applies a default template to supply the missing gene values. As evolution proceeds, and the length of the GA is allowed to change from long to short and

back to long again, those bitstrings with efficient gene orders will be preferentially selected. Beasley et al. (1993) used a priori knowledge to code interactive genes into sub-problems, which are subjected to separate evolutionary processes and are recombined in each generation. This requires that some exogenous process identifies the sub-problems. Simoes and Costa (1999) examined the usefulness of McClintock’s transposons as a replacement for the crossover operator. In their work, randomly selected runs of bitstrings were moved about on the chromosome. No effort was made to record which bitstrings worked best together. The impact of one gene on the fitness contribution of another is called epistasis. In the schema S1 discussed earlier, assume that the high fitness of S1 derives from an epistatic interaction between S2 and S3, and not merely from the separate high fitnesses of these two schemas. This would be all the more reason for S2 and S3 to be adjacent to one another and constitute a compact building block. The matter might be more complex, however. While the tight coupling of high epistatic genes into building blocks might seem at first glance to be an unalloyed good, further reflection shows the advantage of repositioning genes on our illustrative chromosome accrues only after the good 110 and 11 alleles first occur on the genome, after which preservation of these alleles as a building block becomes advantageous. A different, indeed opposite, argument might apply to the process of searching for high fitness schemata. During the early generations, the GA is still searching for good combinations of alleles, and crossover is the primary tool for searching out novel combinations. If the 110 and 11 alleles exist on two different parental chromosomes, they are more likely to be recombined as the result of crossover if the genes are distant from one another. In the work reported in this paper, however, we have observed only the benefits gained by placing epistatically linked genes close to one another. This issue is addressed further in Section 5. 3. Demonstrating gene order effects in a genetic algorithm We here demonstrate the possibility of a gene order effect by using an extremely simple fitness function, namely the function (to be maximized) specified by equation (1) F ¼ minðA=B; B=AÞ * C

ð1Þ

where A, B, and C take on values between 0 and 3.0. The minimization operation thus constrains the AB term to values less than or equal to 1.0 and fitness, F, to the range 0.0-3.0. The epistatic nature of the problem arises from the fact that the AB term is maximized (at 1.0) only if A and B are equal. The C variable has no impact on the AB term, and contributes to overall fitness in simple proportion to its value. From a theoretical standpoint, focusing exclusively on the imperative of retaining good building blocks, one would

Optimal gene order

1055

K 33,5/6

1056

Figure 1. GA effectiveness on short chromosomes when crossover is allowed at any place on the bitstring

expect that a chromosome where the variables A and B were side by side would allow the GA to perform more efficiently than on with A and B separated by C. Thus, in the six ways of ordering A, B, and C, four are expected to be good orders (ABC, BAC, CAB, CBA), and two are expected to be bad orders (ACB, BCA). The GA we used employed standard binary encoding, with three 8 bit genes and a chromosome length that varied depending upon the requirements of the experiment. The GA parameters for all experiments included: population 30, generations 30, mutation rate 0.01, crossover rate 1.0, and repetitions 100, with a new random seed for each repetition. Crossover was single point, and occurred at either the gene boundary only, or any place on the chromosome, depending upon the experiment. All six possible gene orders (four good, two bad) were tested. Results of the experiments are shown in Figures 1-3. For each experiment, the results from the 100 runs of the six gene orders hypothesized to be good (those like CBA, that kept A and B together) were averaged together. Results from runs of the two gene orders hypothesized to be bad were also averaged. Three experimental setups were used. In the first setup, the chromosome length was set to 24 (short chromosome), and crossover was only allowed at the gene boundaries. All of the genes were exons, that is, they all expressed values used in the solution of the problem. For the second and third setups, the chromosome length was increased by the introduction of 108 bits of non-coding introns to the right of each gene (long chromosome). The second experiment retained crossover at the gene boundary only, while the third allowed crossover anywhere on the length of the chromosome. Only the 24 bits in the three genes were exons.

Optimal gene order

1057

Figure 2. GA effectiveness on long chromosomes with introns and crossover allowed at any place on the bitstring

Figure 3. GA effectiveness on long chromosomes with crossover at gene boundaries

Figures 1-3 show that a small but definite improvement in the performance of the GA can be attained if genes are ordered optimally, i.e. if A and B are not separated by C. The effect is small, but the genome itself is small, so a large gene order effect is not to be expected.

K 33,5/6

1058

4. Detecting optimal gene order by reconstructability analysis Assuming then that gene order matters, and that it might matter more dramatically for more complex genomes and fitness functions, the challenge is to find out what the optimum gene order is. In this section, we show that this determination is in fact achievable. Information on F(A, B, C) is generated by the GA, and this information can be analyzed to find the optimal gene order, even when the GA is initially implemented with a non-optimal order. We here show that this can be done using the methods of reconstructability analysis (RA). 4.1 Reconstructability analysis RA derives from Ashby (1964), and was developed by Broekstra, Cavallo, Cellier, Conant, Jones, Klir, Krippendorff, and others; an extensive bibliography is available in Klir (1986), and a compact summary of RA is available in Zwick (2001a). RA resembles log-linear (LL) methods (Bishop et al., 1978; Knoke and Burke, 1980), used widely in the social sciences, and where RA and LL methodologies overlap they are equivalent (Knoke and Burke, 1980; Krippendorff, 1986). In RA (Klir, 1985), a probability or frequency distribution or a set-theoretic relation is decomposed (compressed, simplified) into component distributions or relations. ABC might thus be decomposed into AB and BC projections, written as the structure, AB:BC. The two linked bivariate distributions (or relations) constitute a model of the data. RA can model problems both where “independent variables” (inputs) and “dependent variables” (outputs) are distinguished (directed systems) and where this distinction is not made (neutral systems). In the present case, we have a directed system, with independent variables (genes) A, B, and C, where the dependent variable is the fitness value, F. Consider “regression models” (Krippendorff, 1986) where there is no overlap between genes in the separate components of the model. These models are ABF : CF, ACF : BF, BCF : AF, and AF : BF : CF. Interaction between two variables (i.e. an epistatic link, an “interaction effect”) is indicated when the model places the two input variables next to one another along with output F. So ABF : CF indicates that A and B together contribute to the fitness F separately from the contribution made by C to F. It is also useful to look at “chain models” (Krippendorff, 1986), which feature overlapping of input variables (like ABF : BCF). Chain models do not yield disjoint subproblems, but they indicate particular orders of variables on the chromosome, e.g. ABF : BCF corresponds to order ABC. 4.2 RA calculations Calculations were made using the RA software programs developed at Portland State University, USA now integrated into the package OCCAM (for the principle of parsimony and as an acronym for “Organizational Complexity Computation And Modeling”). The earliest of these programs was developed by Zwick and Hosseini (Hosseini et al., 1986); a review of RA methodology is offered in Zwick (2001a); a list of recent RA papers is given in Zwick (2001b).

The RA was conducted on a dataset generated by multiple runs of the GA, using the two bad orders only. Results are shown in Table I. Using the same parameters as experiment 1, these runs first saved all members of the population, in excess of 10,000 records. Then, to select data associated with the most fit solutions, the highest scoring members of that population were extracted. The cutoff point was a fitness of at least 2.0, and a total of 7,800 records resulted. Values of A, B, C, and F were then discretized into five equally spaced bins, and the results were analyzed by the OCCAM software package. The RA results of Table I show that models corresponding to good gene orders are clearly superior to those with bad gene orders. Consider first the regression models, shown in the table in bold. These models assess the possible partitions of the problem into disjoint subproblems. Of the four regression models, ABF : CF is clearly the best, indicating that A and B are epistatically linked, while C make an independent to fitness. This suggests that A and B should be placed near to one another. Consider now the chain models, shown in the table in italics, which directly indicate how well different gene orders fit the data. ABF : BCF and ABF : ACF, corresponding to orders ABC (and CBA) and BAC (and CAB), respectively, are the best models, in agreement with the

Model

Optimal gene order

1059

I

1.000 0.999 ABF : BCF 0.999 ABF : ACF 0.994 ABF : CF 0.980 ACF : BCF 0.659 AF : BCF 0.655 ACF : BF 0.653 AF : BF : CF 0.639 BCF 0.631 BF : CF 0.616 ACF 0.610 AF : CF 0.591 CF 0.554 ABF 0.358 AF : BF 0.092 BF 0.067 AF 0.039 F 0.000 Notes: I is the information captured in the model, relative to 100 percent knowledge of F for the top model ABCF where the joint dependence of F on all genes is known, and 0 percent knowledge of F for the bottom model, where A, B, and C are all unknown. Regression model are in bold, and chain models are in italics. Other models are shown in smaller font. ABCF ABF : ACF : BCF

Table I. RA results

K 33,5/6

1060

implications of the regression models. These models as well support the idea that A and B should be adjacent. 5. Discussion For the simple test problem shown, where a part of the solution depends on the interaction of epistatic genes, the good orders (those that kept epistatic genes together) found better solutions faster than did orders that separated the epistatic genes. The gene order effect was small, but in more problems with more variables, it may become more substantial. The relative effectiveness of the two sets of orders changed throughout the experiments. At the beginning, roughly the first five generations, the bad orders performed about the same as the good ones. For the next 20 generations, the good orders performed better. In the end game, when both were approaching the solution asymptotically, the bad orders slowly caught up, but were still behind the performance of the good orders at the end. It was noted in Section 2 that the impact of separation on epistatic genes might be more complex than what is suggested by the main results of this paper. Specifically, one might expect that in the early phases of a GA run, epistatically linked genes should best be located far from one another. This is based on the supposition that at the beginning of the search it may be useful for all genes to be mixed as much as possible by the crossover operator. If two epistatic genes are side-by-side from the beginning, then crossover would have less chance of improving them, and the GA would have to depend upon a good initialization and fortunate mutations to create the best gene pair possible. If, on the other hand, two epistatic genes start out well separated, the crossover operation might more easily assemble a larger selection of allele patterns in the two genes. These expectations axe under continuing investigation, but so far we have not seen any clear evidence for them, i.e. for the better performance at the beginning of GA runs of orders which separate A and B. Our runs start out with good and bad order nearly equivalent in performance. At some point, the separation of A and B by C in “bad” orders definitely becomes a handicap, and that these orders fall behind the others. RA allows one to find the simplest models that retain high information about the data. The top model, ABCF, includes interactions among all variables, but Table I shows that ABF : CF has virtually complete information (98 percent), so solving the ABF and CF subproblems separately and merging the answers would probably give a good result. We suggest that this might be a way to solve Beasley et al.’s problem of a priori identification of subproblems for expansive coding. Using RA to decompose optimization problems into subproblems might of course also be useful for optimization methods other than the GA. The success of this experiment means that in principle we have a way to restructure the genome of a GA based on data that the GA itself generates.

This might speed up processing in a particular GA run, if the optimum order can be detected early enough for the GA to gain an advantage from gene reordering. It could also offer a way to build a GA that is optimized for a specific type of problem. To use a real-world example, one of the authors recently studied the use of a GA to solve an inventory and distribution problem (Shervais, 2000a, b). One would expect that any set of such problems with the identical number and structure of nodes and stocks could use information generated by the first specific problem to be addressed. Alternatively, we may be able to apply RA to binned data to prestructure the genome for optimizing the fitness of the original unbinned variables. This may become useful as we search for more complex problems to test this approach on. References Ashby, W.R. (1964), “Constraint analysis of many-dimensional relations”, General Systems Yearbook, Vol. 9, pp. 99-105. Beasley, D., Bull, D. and Martin, R. (1993), “Reducing epistasis in combinatorial problems by expansive coding”, Proceedings of the Fifth International Conference on Genetic Algorithms, Morgan Kaufman Publishers, San Mateo, pp. 400-7. Bishop, Y., Feinberg, S. and Holland, P. (1978), Discrete Multivariate Analysis, MIT Press, Cambridge, MA. Goldberg, D., Deb, K., Kargupta, H. and Harik, G. (1993), “Rapid, accurate optimization of difficult problems using fast messy genetic algorithms”, IlliGAL Report No. 93004, University of Illinois, IL. Holland, J. (1975), Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Arbor, MI. Hosseini, J.C., Harmon, R.R. and Zwick, M. (1986), “Segment congruence analysis via information theory”, Proceedings, International Society for General Systems Research, May 1986, Philadelphia, PA, pp. G62-G77. Klir, G. (1985), The Architecture of Systems Problem Solving, Plenum Press, New York, NY. Klir, G. (1986), “Reconstructability analysis: an offspring of Ashby’s constraint theory”, Systems Research, Vol. 3 No. 4, pp. 267-71. Knoke, D. and Burke, P.J. (1980), Log-Linear Models, Quantitative Applications in the Social Sciences Monograph #20, Sage, Beverly Hills, CA. Krippendorff, K. (1986), “Information theory: structural models for qualitative data”, Quantitative Applications in the Social Sciences #62, Sage, Beverly Hills, CA. McClintock, B. (1987), The Discovery and Characterization of Transposable Elements: The Collected Papers of Barbara McClintock, Garland, New York, NY. Mitchell, M. (1996), An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA. Shervais, S. (2000a), “Developing improved inventory and transportation policies for distribution systems using genetic algorithm and neural network methods”, Proceedings of the World Conference on the Systems Sciences, Toronto, Canada, pp. 200059-1-200059-17. Shervais, S. (2000b), “Adaptive critic design of control policies for a multi-echelon inventory system”, PhD dissertation, Portland State University, available at: www.cbpa.ewu.edu/ , sshervais/Personal_Info/Papers/Dissertation.ps.zip

Optimal gene order

1061

K 33,5/6

1062

Simoes, A. and Costa, E. (1999), “Transposition:{a} biologically inspired mechanism to use with genetic algorithms”, Proceedings of the Fourth International Conference on Neural Networks and Genetic Algorithms ({ICANNGA}99), Portoroz, Slovenia, Springer-Verlag, Berlin, pp. 178-86. Thornton, C. (1997), “The building block fallacy”, Complexity International, Vol. 4, available at: www.csu.edu.au/ci/vol04/thornton/building.htm Zwick, M. (2001a), “Wholes and parts in general systems methodology”, In: Wagner, G. (Ed.), The Character Concept in Evolutionary Biology, Academic Press, New York, NY, pp. 237-56. Zwick, M. (2001b), Discrete Multivariate Modeling, available at: www.sysc.pdx.edu/res_struct. html Further reading Stadler, P., Seitz, R. and Wagner, G.P. (2000), “Population dependent Fourier decomposition of fitness landscapes over recombination spaces: evolvability of complex characters”, Bulletin of Mathematical Biology, Vol. 62 No. 3, pp. 399-428.

Book reviews Handbook of Fingerprint Recognition Davide Maltoni, Dario Maio, Anil K. Jain and Salil Probhakar Springer New York 2003 xii + 348 pp. ISBN 0-387-95431-7 hardback, £46.00 with DVD-ROM Review DOI 10.1108/03684920410534128 Keywords Crime, Biometrics, Forensic science This is a comprehensive review of its topic, whose importance in crime investigation has been recognised for many years. There is now reason for interest from another viewpoint since fingerprint recognition is one of the biometric techniques coming into use for personal identification in many areas (welfare disbursement, entry to secure premises, automatic teller machines, driving licences, and so on). Various possibilities for such biometric methods are compared and those most commonly used are listed as fingerprints, face, iris, speech and hand geometry. It is acknowledged that fingerprint recognition is a complex problem, and despite a great deal of attention to it over nearly 50 years in the forensic context it is by no means a fully-solved problem, despite a popular misconception that it is. It is still a challenging and important recognition problem, especially where poor-quality images must be processed. The authors aim in writing this book is as follows. . Introduce the readers to automatic techniques for fingerprint recognition. Introductory material is provided on all components/modules of a fingerprint recognition system. . Provide an in-depth survey of the state-of-the-art in fingerprint recognition. . Present in detail recent advances in fingerprint recognition, including sensing, feature extraction, matching and classification techniques, synthetic fingerprint generation, multimodal biometric systems, fingerprint individuality, and design of secure fingerprint systems. . Serve as the first complete reference book on fingerprint recognition, including an exhaustive bibliography.

Book reviews

1063

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1063-1068 q Emerald Group Publishing Limited 0368-492X

K 33,5/6

1064

All of these aims are well met in the nine chapters and 40-page bibliography. The intended audience includes researchers, practising engineers, and students who wish to understand and/or develop fingerprint-based recognition systems. The book is suggested as a reference book for a graduate course on biometrics, and the thoroughness of its treatment of this topic should be stressed because it is not obvious from the title. The material is clearly presented, with only a light sprinkling of mathematics, but with a great deal of detail in the illustrations, graphs and tables. This will certainly be a standard reference work in its field. The included DVD contains four fingerprint databases used in a 2002 Fingerprint Verification Competition, and another four that were used in a similar event in 2000. It also includes a demonstration version of software that can be used to generate synthetic fingerprint images. The DVD has to be used with a suitably-equipped computer, and it is perhaps hardly necessary to mention (though I was not sure till I tried it) that nothing of its content can be viewed on a DVD player attached to a television set. Alex M. Andrew

Introduction to Evolutionary Computing A.E. Eiben and J.E. Smith Springer Berlin 2003 xv+299 pp. ISBN 3-540-40184-9 hardback, £30.00 Natural Computing Series Keywords , Computing, Evolution, Genetics This is intended primarily as a textbook for lecturers and graduate and undergraduate students, but will certainly attract a wider readership. The authors explain that each of them has many years of teaching experience, and has given instruction on evolutionary computing (EC) both in his home university and elsewhere, and they realised the need for a suitable textbook and decided to write this one. They have provided examples of practical applications in most chapters, as well as exercises for the student and suggestions for further reading. There is also a Web site from which teaching material including a PowerPoint presentation can be downloaded and used freely. The basic ideas of neo-Darwinian evolution are briefly reviewed since they provide the inspiration for EC methods, but the emphasis is on practical computing and special stratagems are introduced as needed, irrespective of biological parallels. EC methods are characterised by being population based,

so that a whole collection of candidate solutions are held and processed simultaneously, with some form of recombination to mix the solutions, and stochastic features expressed as random recombination and mutation. The term EC is used to cover four “dialects” denoted by genetic algorithms (GA), evolution strategies (ES), evolutionary programming (EP) and genetic programming (GP). These operate on different kinds of population units, or representations. GA operates on strings of symbols from a finite alphabet, analogous to the genetic code, whereas ES operates on real-value vectors, EP on finite state machines, and GP on tree structures. The power of the methods is demonstrated by the examples of applications described. Some of these may seem esoteric, referring to optimisation of specially-devised test functions, or the colouring of graphs, or the problem of placing eight queens on a chessboard so that none puts another into check, or learning to play checkers (draughts). Others, however, such as the “knapsack problem”, can be argued to model a class of practical problems, and there are others that are thoroughly practical, including scheduling and timetabling and modelling financial markets. A particularly convincing example is mentioned in the first chapter, where an irregular lattice structure that was designed using a GA algorithm is shown. Although it looks as though it was the result of being run over by a car, it is vastly superior to a regular structure in an application where vibrations could be troublesome. Besides serving as an introduction the book is a guide to the state-of-the-art. One advanced topic is self-adaptation, whereby the adaptive process itself is improved adaptively. This is developed in connection with ES, which deals with continuous variables, and the adaptive improvements include the adjustment of step sizes and reorientation of axes in the test hyperspace. This has at least superficial similarity to considerations explored by Rosenbrock (1960) on optimisation using a single operating point. It is also acknowledged that evolution, although epitomised as “survival of the fittest” can involve co-operation as well as competition, and results on the “prisoners’ dilemma” and evolution of altruism are mentioned. Surprisingly, the Gaia hypothesis of Lovelock (1979) is not mentioned although it provides strong evidence for co-operation in evolution. All the methods depend on the specification of a fitness function to govern survival. The possibility of interactive operation where a human operator evaluates fitness is treated, with examples of evolutionary art so produced. This is a well-produced and very useful book. Alex M. Andrew References Lovelock, J.E. (1979), Gaia: A New Look at Life on Earth, Oxford University Press, Oxford. Rosenbrock, H.H. (1960), “An automatic method for finding the greatest or least value of a function”, Computer J., Vol. 3, pp. 75-184.

Book reviews

1065

K 33,5/6

1066

Embedded Robotics: Mobile Robot Design and Applications with Embedded Systems Thomas Bra¨unl Springer Berlin 2003 xiii + 434 pp. ISBN 3-540-03436-6 hardback, £54.00 Review DOI 10.1108/03684920410534137 This is a detailed account of a very comprehensive and successful study of robotics, conducted mainly in the University of Western Australia, but with acknowledgement of the participation of a number of other centres, in Canada, Germany, New Zealand and USA. The assistance of colleagues in various locations in writing the book is acknowledged. The main development is a remarkably powerful controller, termed “EyeCon”, that has been produced in a suitable form to be installed in quite small robots. When connected to a digital camera it is sufficiently powerful to allow on-board image processing, and it has the means of interacting with other sensors and actuators with high precision. It can run programs in C and C++, and so complex robot behaviour is readily achievable. The EyeCon controller itself, as well as associated hardware, and a variety of complete robots termed the EyeBot family, are available commercially from a company called Joker Robotics, with Web site: http://joker-robotics.com. The EyeBot family includes vehicles of various kinds as well as six-legged walkers, biped android walkers, and a flying robot. All the software mentioned in the book is freely available at: http://robotics.ee.uwa.au/eyebot/. This includes the operating system used in the controller, as well as C and C++ compilers to run under Windows or Linux, image processing tools, a simulation system and a large collection of sample programs. A large amount of teaching material including PowerPoint slides is available from the same source. The title of the book is slightly ambiguous since it is not immediately obvious what is meant to be embedded in what. The intention is that a versatile controller is embedded in many different robots. The various considerations are treated in remarkable detail. In the first part of the book there are descriptions of various programming tools, and details of the inner working of the operating system, including its means of achieving multitasking with appropriate synchronisation depending on Dijkstra “semaphores”. Sensors and actuators are treated, as well as the means of real-time image processing and a facility for wireless communication.

In the second part, various aspects of the design of mobile robots are discussed, including the various means that can be used to propel vehicles controllably. One of these is an omni-directional drive. Balancing robots, walking robots and flying robots are also treated, as well as a simulator program. The most remarkable part of the book is the third part in which successful applications are described. They include maze exploration, discussed in the context of a “Micro-Mouse” international competition, and map generation, in which the robot has to explore and record a previously unknown environment which, unlike the test mazes that have been used, does not in general have all obstacles in a rectangular grid. Still more dramatic is the application of the principles to robot football, which is organised internationally in a number of different leagues. The most advanced league of all is that in which it is planned that humanoid legged robots should compete, but this is ahead of current technology. EyeBots have performed well in the next most advanced option where each wheeled player uses only data from its own sensors. In preparation for the two later chapters this part has reviews of necessary aspects of artificial neural nets, genetic algorithms, and genetic programming. One of the later chapters then treats behaviour-based systems, in which sensors are linked to actuators more directly than through a model of the environment (Brooks, 1999). A robot using a neural net is described, and is able to learn to locate a ball and to drive towards it. The succeeding chapter presents really impressive results in achieving legged locomotion, including the biped version. The control methods were evolved using genetic programming, once a suitable framework had been set-up using splines to compute smooth trajectories. The number of trials needed for the evolution were such that they were performed using simulation rather than the physical robot, but effective control seems to have been achieved surprisingly readily, considering the usually-assumed difficulty of the task. Everything is presented in a pleasant chatty style, and with a remarkable amount of detail, so that, for example, the theory of splines is explained and there are even program listings to show how they were computed. The quality of production is high, with numerous illustrations, some in colour. There is a very great deal here that anyone concerned with autonomous robots will certainly want to peruse, irrespective of whether it is EyeCons or controllers of some other kind that are embedded. Alex M. Andrew Reference Brooks, R.A. (1999), Cambrian Intelligence: The Early History of the New AI, (reviewed in Kybernetes, Vol. 29 No. 4, p. 529, 2000, MIT Press, Cambridge, MA.

Book reviews

1067

K 33,5/6

1068

Biometrics: Identity Verification in a Networked World Samir Nanavati John Wiley & Sons 2003 ISBN 0-471-099-457 £24.50 Review DOI 10.1108/03684920410534146 Keywords Biometrics, Systems science, Security This book is described by the publishers Wiley as a “Wiley Tech Brief”, and as such it just fulfills its aims. Cyberneticians and Systemists need to know, however, that the text does not delve very deeply into any of the technology it presents. It need not, however, dwell on the detail in a technology that changes as we write this review. Both methodology and application have to change as new research is harnessed to provide new techniques and those who are dedicated to defeating new security systems become ever more successful. The various techniques employed have been discussed in the sections of this journal as they receive prominence, but Biometrics has to be a changing field. Currently, this book highlights such devices as face, fingerprint, eye or speech analysis as current “state-of-the art” technology, tomorrow’s effective methods may be based on DNA, implanted body-chips. This book, however, concentrates on the “Networked World” and in consequence deals with IT security primarily. It does this by giving relevant explanations and illustrations of the current favoured technologies, describing them competently and in an undemanding manner. The chapters that discuss standards and privacy are well presented but lack any real depth. But then it is a question for whom the book was designed for, and whether the readership is expected to have some background knowledge of IT systems and to what level. Bearing this in mind it does function as a useful brief for most IT practitioners, but not of any real use to those involved in applying biometric techniques to a variety of systems situation. D.M. Hutton

Book reports

Book reports

Biology, Mathematics, Biocybernetics and Systems An interest in biocybernetics or the application of cybernetics to biology implies the use of mathematics. A recent report in Mathematics Today (December 2003) published by the Institute of Mathematics and its Applications gives details of three books, written between 1999 and 2000, that contribute to this exciting and growing field. The details of the three books are given below.

1069

Self-organised Biological Dynamics and Non-linear Control: Toward Understanding Complexity, Chaos and Emergent Function in Living Systems

J. Walleczwek (Ed.) Cambridge University Press 2000 428 pp. ISBN 0-521-62436-3 £65.00 ** Mathematical Models for Biological Pattern Formation: Frontiers in Applications of Mathematics IMA Volumes in Mathematics and its Applications 121 P.K. Maini and H.G. Othmer (Eds) Springer-Verlag London, Berlin Heidleberg 2001 317. pp. ISBN 0-387 95103-2 £96-00 ** On Growth and Form: Spatio-Temporal Pattern Formation in Biology M.A.J. Chaplain, D. Sing and J.C. Mclachlan (Eds) John Wiley and Sons 1999 413 pp. ISBN 0-471-98451-5 £80-00 ** These three books were compiled as a result of conferences in this field and in consequence provide a reasonable state-of-the-art account of recent

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1069-1072 q Emerald Group Publishing Limited 0368-492X

K 33,5/6

1070

developments, the second book on mathematical models being published by the IMA. The first two books listed have similar subjects concerning pattern formations whilst the third we are told “loosely focuses on chemical and electrical oscillations in biological systems and how these are affected by external stimuli”. The report does suggest that “none of the volumes is perfect” but the readers in this field would find useful and enjoyable chapters in any of these books. The recommendation from this report by Dr J.S. Amima (Imperial College, London, UK) is that anyone interested in biomathematics should look at all three, but that his personal choice is the IMA Volume by Maini and Othmer (Editors)

Guide to Applying the UML Sidan Si Alhir Springer-Verlag London, Berlin, Heidelberg 2002 410 pp. ISBN 0-387-95209-8 £35.00/e49.95 (hardbound) Cyberneticians and systemists with an interest in Management Cybernetics or Systems Science will be aware that UML language is very well used by systems practitioners and the leaders and managers of programmes and projects. It is an established industry standard for systems analysis and design. Interest in UML has revived because of its increased importance, particularly on the Internet. Readers who need a guide to UML are offered a practical guide to the language and its use as a versatile tool. The book’s content is extensive and it covers the following 11 chapters. .

Chapter 1. An introduction to UML with a discussion of its applications

.

Chapters 2 and 3 – covers information about modelling/system development and activity, as well as considering the principles of object orientation.

.

Chapters 4-7 – covers UML models and their uses as well as describing the basic models introduced earlier.

.

Chapters 8 and 9 look at implication modelling (The component diagram) and the implementation of the system.

.

Finally, Chapters 10 and 11 discuss the language extensions and the Object Constraint Language (OCL)

The guide is reported to provide a good practical introduction to UML and its applications. In addition it provides some important lists of references.

Web Intelligence Ning Zhong, Jiming Liu and Yiyu Yao (Eds) Springer-Verlag New York 2003 450 pp. ISBN 3-540-44384-3 $69.95 (hardcover) The authors aimed to explore the fundamental roles as well as the practical impacts of artificial intelligence and advanced information technology for the next generation of Web-empowered systems, services and environments. We all know of the problems in dealing with the concepts involved in a study of “intelligence”. Readers would have the opportunity of discovering what these editors of the text mean. What is described as “Web Intelligence” is of much interest to researchers and a text that deals with the subject will be welcome. What is now called the “Wisdom Web” which also includes the “Semantic Web” are key areas of research and development. The publishers advertise the book by stating: The coherently written multi-author monograph provides a thorough introduction and a systematic overview of this new field.

It is a new field, but quite obviously built upon the not insignificant research and developments of past decades. It does attempt, however, to discuss the current state of research and development and provides some much needed consideration of the areas of application and endeavour.

Emergence in Complex, Cognitive, Social and Biological Systems Giafranco Mianti and Eliano Pessa Kluwer Academic/Plenum New York 2002 i-xiv, 394 pp. ISBN 0-306-47358-5 Price on application to publishers The Italian Systems Society Conference 2001 is the basis of this volume’s contributions. Some 32 papers are included on the conference theme “Emergence”. This society is known for its timely contributions to systems and the volume is said to be an important and worthwhile collection of interesting studies on this theme.

Book reports

1071

K 33,5/6

1072

Control in an Information Rich World Richard M. Murray (Ed.) Society for Industrial and Applied Mathematics (SIAM) 2003 112 pp. ISBN 0-89871-528-8 £26.50 (paperback) This is a report of a panel that considered future directions in control, dynamics and systems.

Systematic Organisation of Information in Fuzzy Systems Pedro Melo-Pinto, Horia-Nicolai Teodorescu and Toshio Fakuda (Eds) IOS Press Amsterdam 2003 i-viii, 399 pp. ISBN 1-58603-295-X Another volume based on an international conference (NATO Advanced Workshop, Vila Real, Portugal, 200l). Some 24 papers are included on the current developments in fuzzy systems and such applications that lie in knowledge and information studies. C.J.H. Mann Book Reviews and Reports Editor

Announcements July 2004 PODC 2004 – Principles of Distributed Computing 2004, St Johns, Canada, 31 June-1 July Contact: Soma Chaudhuri. Tel: 515-2948547; E-mail: [email protected]

Announcements

1073

ISSTA 2004 – International Symposium on Software Testing and Analysis Boston, Massachusetts, USA, 31 June-1 July Contact: George Aryrunin. Tel: 413-545-0510; E-mail: [email protected]. edu International Biometric Society Conference – IBC2004, Queensland, Australia, 11-16 July Contact: IBC 2004 Secretariat. E-mail: [email protected]; Web site: www.ozaccom.com.au/ibc2004 DIS 2004: Designing Interactive Systems 2004 Cambridge, Massachusetts, USA, 18-21 July Contact: Paul Moody. E-mail: [email protected] DCC 2004 International Conference on Design Computing and Cognition, MIT, Cambridge, Massachusetts, USA, 19-21 July Contact: Web site: http://www.arch.usyd.edu.au/kcdc/conference/dcc04 EVA 2004 – 3rd International Symposium on Extreme Value Analysis, Aveiro, Portugal, 19-23 July Contact: E-mail: [email protected]; Web site: www.mat.ua.pt/eva2004/ August 2004 The 7th Australasian Conference on Mathematics and Computers in Sport Massey University, Palmerston North, New Zealand, 30 August-1 September Contact: E-mail: [email protected] 1st International Conference on Mobility Communications Systems and Applications – MCSA ’04, Bangkok, Thailand, 2-4 August Contact: Ivan Boo. Tel: 65-6295-5790; E-mail: [email protected] HT 2004 – 15th Conference on Hypertext and Hypermedia 2004 Santa Cruz, California, USA, 9-13 August Contact: Tel: 831-459-1227; E-mail: [email protected] 2004 International Conference on Parallel Processing ICPP-2004, Montreal, Quebec, Canada, 15-18 August Contact: Web site: www.ece.purdue.edu/icpp2004

Kybernetes Vol. 33 No. 5/6, 2004 pp. 1073-1074 q Emerald Group Publishing Limited 0368-492X

K 33,5/6

1074

ISESE 2004: International Symposium on Empirical Software Engineering 2004, Redondo Beach, California, USA, 19-20 August Contact: Marvin Zelkowitz. E-mail: [email protected] International Conference on Knowledge Discovery and Data Mining – KDD 2004, Seattle, Washington, USA, 22-25 August Contact: Won Kim. Tel: +1-512-349-9757; E-mail: [email protected] WCC – IFIP World Computer Congress – WCC2004, Toulouse, France, 22-27 August Contact: Web site: www.wcc2004.org PPDP 2004 – Principles and Practice of Declarative Programming, Verona, Italy, 24-26 August Contact: Eugenio Moggi. Tel: 39-010-3536629; E-mail: [email protected]

Special announcements

IFIP-WCC 2004 – World Computer Congress Toulouse, France 22-27 August 2004 This is the 18th World Computer Congress. It offers: .

topical sessions or days for high-level surveys,

.

conferences for accomplished results,

.

workshops for ongoing research, and

.

student forum for doctoral research.

Conferences include: TCS – Theoretical Computer Science; SEC – Information Security; CARDIS – Smart card research and advanced applications, DIPES; AIAI; HESSD; PRO-VE; 13E and HCE. For further Information, visit the Web site: www.wcc2004.org

The IEEE TFCC Cluster Conference –

Cluster2004 San Diego, California, USA 20-23 September 2004 Topics: .

Cluster middleware

.

Cluster networking

.

Cluster management and maintenance applications

.

Performance analysis/evaluation

.

Grid computing

For further information visit the Web site: http://grail.sdsc.edu/cluster 2004 Pre-conference tutorials – 20 September 2004 Conference – 21-23 September 2004

Special announcements

1075