Computational Intelligence Paradigms in Advanced Pattern Classification (Studies in Computational Intelligence, 386) 9783642240485, 3642240488

This monograph presents selected areas of application of pattern recognition and classification approaches including han

115 79 8MB

English Pages 210 [206] Year 2012

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computational Intelligence Paradigms in Advanced Pattern Classification (Studies in Computational Intelligence, 386)
 9783642240485, 3642240488

Table of contents :
Title
Preface
Contents
Recent Advances in Pattern Classification
New Directions in Pattern Classification
References
Neural Networks for Handwriting Recognition
Introduction
State-of-the-Art
Contribution
Data Processing
General Processing Steps
Our Online System
Our Offline System
Neural Network Based Recognition
Recurrent Neural Networks (RNNs)
Long Short-Term Memory (LSTM)
Bidirectional Recurrent Neural Networks
Connectionist Temporal Classification (CTC)
Multidimensional Recurrent Neural Networks
Hierarchical Subsampling Recurrent Neural Networks
Experiments
Comparison with HMMs on the IAM Databases
Recognition Performance of MDLSTM on Contest’ Data
Conclusion
References
Moving Object Detection from Mobile
Platforms Using Stereo Data Registration
Introduction
Related Work
Proposed Approach
System Setup
Feature Detection and Tracking
Robust Registration
Frame Subtraction
Experimental Results
Conclusions
References
Pattern Classifications in Cognitive Informatics
Introduction
Semantic Analysis Stages
Semantic Analysis vs. Cognitive Informatics
Example of a Cognitive UBIAS System
Conclusions
References
Optimal Differential Filter on Hexagonal Lattice
Introduction
Preliminaries
Least Inconsistent Image
Point Spread Function
Condition for Gradient Filter
Numerical Optimization
Theoretical Evaluation
Signal-to-Noise Ratio
Localization
Experimental Evaluation
Construction of Artificial Images
Detection of Gradient Intensity and Orientation
Overington’s Method of Orientation Detection
Relationship between Derived Filter and Staunton Filter
Experiment and Results
Discussion
Summary
References
Graph Image Language TechniquesSupporting Advanced Classification and
Cognitive Interpretation of CT Coronary Vessel Visualizations
Introduction
The Classification Problem
Stages in the Analysis of CT Images under a Structural
Approach Utilising Graph Techniques
Parsing Languages Generated by Graph Grammars
Picture Grammars in Classification and Semantic
Interpretation of 3D Coronary Vessels Visualisations
Characteristics of the Image Data
Preliminary Analysis of 3D Coronary Vascularisation
Reconstructions
Graph-Based Linguistic Formalisms in Spatial Modelling of
Coronary Vessels
Detecting Lesions and Constructing the Syntactic Analyser
Selected Results
Conclusions Concerning the Advanced Classification and
Cognitive Interpretation of CT Coronary Vessel Visualizations
References
A Graph Matching Approach to Symmetry
Detection and Analysis
Introduction
Symmetries and Their Properties
Rotational Symmetry
Reflectional Symmetry
Interrelations between Rotational and Reflectional Symmetries
Discussion
Previous Work
Previous Work in Symmetry Detection and Analysis
Local Features
Spectral Matching of Sets of Points in Rn
Spectral Symmetry Analysis
Spectral Symmetry Analysis of Sets in $R$n
Spectral Symmetry Analysis of Images
Experimental Results
Symmetry Analysis of Images
Statistical Accuracy Analysis
Analysis of Three-Dimensional Symmetry
Implementation Issues
Additional Results
Conclusions
References
Pattern Classification Methods for Analysis and
Visualization of Brain Perfusion CT Maps
Introduction
Interpretation of Perfusion Maps – Long and Short Time
Prognosis
Image Processing and Abnormality Detection
Image Registration
Affine Registration
FFD Registration
Thirion’s Demons Algorithm
Comparison of Registration Algorithms
Classification of Detected Abnormalities
System Validation and Results
Data Visualization – Augmented Reality Environment
Augmented Reality Environment
Real Time Rendering of 3D Data
Augmented Desktop - System Performance Test
Summary
References
Inference of Co-occurring Classes: Multi-class
and Multi-label Classification
Introduction
Applications
The Classification Process
Data and Annotation
Classification Approaches
Binary Classification
Multi-class Classification
Multi-label Classification
Multi-class Classification
Multiple Binary classifiers
Direct Multi-class Classification
Associative Classification
Multi-label Classification
Semi-supervised (Annotation) Methods
Inference of Co-occurring Affective States from Non-verbal
Speech
Summary
References
Author Index

Citation preview

Marek R. Ogiela and Lakhmi C. Jain (Eds.) Computational Intelligence Paradigms in Advanced Pattern Classification

Studies in Computational Intelligence, Volume 386 Editor-in-Chief Prof. Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences ul. Newelska 6 01-447 Warsaw Poland E-mail: [email protected] Further volumes of this series can be found on our homepage: springer.com Vol. 363. Kishan G. Mehrotra, Chilukuri Mohan, Jae C. Oh, Pramod K. Varshney, and Moonis Ali (Eds.) Developing Concepts in Applied Intelligence, 2011 ISBN 978-3-642-21331-1 Vol. 364. Roger Lee (Ed.) Computer and Information Science, 2011 ISBN 978-3-642-21377-9 Vol. 365. Roger Lee (Ed.) Computers, Networks, Systems, and Industrial Engineering 2011, 2011 ISBN 978-3-642-21374-8 Vol. 366. Mario Köppen, Gerald Schaefer, and Ajith Abraham (Eds.) Intelligent Computational Optimization in Engineering, 2011 ISBN 978-3-642-21704-3 Vol. 367. Gabriel Luque and Enrique Alba Parallel Genetic Algorithms, 2011 ISBN 978-3-642-22083-8 Vol. 368. Roger Lee (Ed.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2011, 2011 ISBN 978-3-642-22287-0 Vol. 369. Dominik Ry_zko, Piotr Gawrysiak, Henryk Rybinski, and Marzena Kryszkiewicz (Eds.) Emerging Intelligent Technologies in Industry, 2011 ISBN 978-3-642-22731-8 Vol. 370. Alexander Mehler, Kai-Uwe Kühnberger, Henning Lobin, Harald Lüngen, Angelika Storrer, and Andreas Witt (Eds.) Modeling, Learning, and Processing of Text Technological Data Structures, 2011 ISBN 978-3-642-22612-0 Vol. 371. Leonid Perlovsky, Ross Deming, and Roman Ilin (Eds.) Emotional Cognitive Neural Algorithms with Engineering Applications, 2011 ISBN 978-3-642-22829-2 Vol. 372. Ant´onio E. Ruano and Annam´aria R. V´arkonyi-K´oczy (Eds.) New Advances in Intelligent Signal Processing, 2011 ISBN 978-3-642-11738-1 Vol. 373. Oleg Okun, Giorgio Valentini, and Matteo Re (Eds.) Ensembles in Machine Learning Applications, 2011 ISBN 978-3-642-22909-1 Vol. 374. Dimitri Plemenos and Georgios Miaoulis (Eds.) Intelligent Computer Graphics 2011, 2011 ISBN 978-3-642-22906-0

Vol. 375. Marenglen Biba and Fatos Xhafa (Eds.) Learning Structure and Schemas from Documents, 2011 ISBN 978-3-642-22912-1 Vol. 376. Toyohide Watanabe and Lakhmi C. Jain (Eds.) Innovations in Intelligent Machines – 2, 2011 ISBN 978-3-642-23189-6 Vol. 377. Roger Lee (Ed.) Software Engineering Research, Management and Applications 2011, 2011 ISBN 978-3-642-23201-5 Vol. 378. János Fodor, Ryszard Klempous, and Carmen Paz Suárez Araujo (Eds.) Recent Advances in Intelligent Engineering Systems, 2011 ISBN 978-3-642-23228-2 Vol. 379. Ferrante Neri, Carlos Cotta, and Pablo Moscato (Eds.) Handbook of Memetic Algorithms, 2011 ISBN 978-3-642-23246-6 Vol. 380. Anthony Brabazon, Michael O’Neill, and Dietmar Maringer (Eds.) Natural Computing in Computational Finance, 2011 ISBN 978-3-642-23335-7 Vol. 381. Radoslaw Katarzyniak, Tzu-Fu Chiu, Chao-Fu Hong, and Ngoc Thanh Nguyen (Eds.) Semantic Methods for Knowledge Management and Communication, 2011 ISBN 978-3-642-23417-0 Vol. 382. F.M.T. Brazier, Kees Nieuwenhuis, Gregor Pavlin, Martijn Warnier, and Costin Badica (Eds.) Intelligent Distributed Computing V, 2011 ISBN 978-3-642-24012-6 Vol. 383. Takayuki Ito, Minjie Zhang, Valentin Robu, Shaheen Fatima, and Tokuro Matsuo (Eds.) New Trends in Agent-based Complex Automated Negotiations, 2011 ISBN 978-3-642-24695-1 Vol. 384. Daphna Weinshall, J¨orn Anem¨uller, and Luc van Gool (Eds.) Detection and Identification of Rare Audiovisual Cues, 2011 ISBN 978-3-642-24033-1 Vol. 385. Alex Graves Supervised Sequence Labelling with Recurrent Neural Networks, 2012 ISBN 978-3-642-24796-5 Vol. 386. Marek R. Ogiela and Lakhmi C. Jain (Eds.) Computational Intelligence Paradigms in Advanced Pattern Classification, 2012 ISBN 978-3-642-24048-5

Marek R. Ogiela and Lakhmi C. Jain (Eds.)

Computational Intelligence Paradigms in Advanced Pattern Classification

123

Editors

Prof. Marek R. Ogiela

Prof. Lakhmi C. Jain

AGH University of Science and Technology 30 Mickiewicza Ave 30-059 Krakow Poland E-mail: [email protected]

University of South Australia Adelaide Mawson Lakes Campus South Australia Australia E-mail: [email protected]

ISBN 978-3-642-24048-5

e-ISBN 978-3-642-24049-2

DOI 10.1007/978-3-642-24049-2 Studies in Computational Intelligence

ISSN 1860-949X

Library of Congress Control Number: 2011936648 c 2012 Springer-Verlag Berlin Heidelberg  This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India. Printed on acid-free paper 987654321 springer.com

Preface

Recent advances in intelligent computational intelligence paradigms have contributed tremendously in modern pattern classification techniques. This book is aimed to provide a sample of the state of art techniques in advanced pattern classification and its possible applications. In particular this book includes nine chapters on using various computational intelligent paradigms in healthcare such as intelligent agents and case-based reasoning. Additionally a number of applications and case studies are presented. Chapter one presents an introduction to pattern classification techniques including current trends in intelligent image analysis and semantic content description. Chapter two is on handwriting recognition using neural networks. The authors have proposed a novel neural network. It is demonstrated that the proposed technique offers higher recognition rate than the other reported techniques in the literature. Chapter three is on moving object detection from mobile platforms. The authors have demonstrated the applicability of their approach to detect moving objects like vehicles or pedestrian in different urban scenarios. Chapter four is on pattern classifications in cognitive environments. The author has demonstrated experimentally that the semantic technique can be used for cognitive data analysis problems in cognitive informatics. Chapter five is on optimal differential filters on hexagonal lattice. The filters are compared with existing optimised filters to demonstrate the superiority of the technique. Chapter six is on graph image language techniques supporting advanced classification and computer interpretation of 3D CT coronary vessel visualizations. Chapter seven is on a graph matching approach to symmetry detection and analysis. The authors have validated their approach using extensive experiments on two and three dimensional synthetic and real life images. Chapter eight is on pattern classification methods used for the analysis of brain visualization and Computer-aided Diagnosis of perfusion CT maps. The final chapter is on the main methods of multi-class and multi-label classification. These can be applied to a large variety of applications and research fields that relate to human knowledge, cognition and behaviour. We believe that scientists, application engineers, university professors, students, and all interested with this subject readers will find this book useful and interesting.

VI

Preface

This book would not have existed without the excellent contributions by the authors. We remain grateful to the reviewers for their constructive comments. The excellent editorial assistance by the Springer-Verlag is acknowledged. Marek R. Ogiela Poland Lakhmi C. Jain Australia

Contents

Chapter 1 Recent Advances in Pattern Classification ..........................................................1 Marek R. Ogiela, Lakhmi C. Jain 1 New Directions in Pattern Classification ......................................................1 References .........................................................................................................4 Chapter 2 Neural Networks for Handwriting Recognition..................................................5 Marcus Liwicki, Alex Graves, Horst Bunke 1 Introduction ..................................................................................................5 1.1 State-of-the-Art.....................................................................................6 1.2 Contribution..........................................................................................7 2 Data Processing ............................................................................................8 2.1 General Processing Steps......................................................................9 2.2 Our Online System .............................................................................10 2.3 Our Offline System.............................................................................12 3 Neural Network Based Recognition ...........................................................12 3.1 Recurrent Neural Networks (RNNs)...................................................12 3.2 Long Short-Term Memory (LSTM) ...................................................13 3.3 Bidirectional Recurrent Neural Networks...........................................16 3.4 Connectionist Temporal Classification (CTC) ...................................16 3.5 Multidimensional Recurrent Neural Networks ...................................17 3.6 Hierarchical Subsampling Recurrent Neural Networks ......................18 4 Experiments ................................................................................................18 4.1 Comparison with HMMs on the IAM Databases................................18 4.2 Recognition Performance of MDLSTM on Contest’ Data .................20 5 Conclusion ..................................................................................................21 References .......................................................................................................21 Chapter 3 Moving Object Detection from Mobile Platforms Using Stereo Data Registration ......................................................................................................... 25 Angel D. Sappa, David Gerónimo, Fadi Dornaika, Mohammad Rouhani, Antonio M. López 1 Introduction ............................................................................................... 25 2 Related Work............................................................................................. 26

VIII

Contents

3 Proposed Approach.................................................................................... 28 3.1 System Setup ..................................................................................... 29 3.2 Feature Detection and Tracking......................................................... 29 3.3 Robust Registration ........................................................................... 31 3.4 Frame Subtraction.............................................................................. 31 4 Experimental Results ................................................................................. 34 5 Conclusions ............................................................................................... 35 References ...................................................................................................... 35 Chapter 4 Pattern Classifications in Cognitive Informatics ............................................ 39 Lidia Ogiela 1 Introduction ............................................................................................... 39 2 Semantic Analysis Stages .......................................................................... 40 3 Semantic Analysis vs. Cognitive Informatics ............................................ 43 4 Example of a Cognitive UBIAS System.................................................... 45 5 Conclusions ............................................................................................... 52 References ...................................................................................................... 52 Chapter 5 Optimal Differential Filter on Hexagonal Lattice............................................ 59 Suguru Saito, Masayuki Nakajiama, Tetsuo Shima 1 Introduction ............................................................................................... 59 2 Preliminaries .............................................................................................. 60 3 Least Inconsistent Image ........................................................................... 60 4 Point Spread Function................................................................................ 64 5 Condition for Gradient Filter ..................................................................... 67 6 Numerical Optimization ............................................................................ 68 7 Theoretical Evaluation............................................................................... 70 7.1 Signal-to-Noise Ratio ........................................................................ 70 7.2 Localization ....................................................................................... 73 8 Experimental Evaluation............................................................................ 74 8.1 Construction of Artificial Images ...................................................... 74 8.2 Detection of Gradient Intensity and Orientation................................ 76 8.3 Overington's Method of Orientation Detection.................................. 76 8.4 Relationship between Derived Filter and Staunton Filter .................. 78 8.5 Experiment and Results ..................................................................... 80 9 Discussion.................................................................................................. 81 10 Summary.................................................................................................. 83 References ...................................................................................................... 86

Contents

IX

Chapter 6 Graph Image Language Techniques Supporting Advanced Classification and Cognitive Interpretation of CT Coronary Vessel Visualizations ............ 89 Mirosław Trzupek 1 Introduction ............................................................................................... 89 2 The Classification Problem........................................................................ 92 3 Stages in the Analysis of CT Images under a Structural Approach Utilising Graph Techniques ....................................................................... 93 4 Parsing Languages Generated by Graph Grammars .................................. 95 5 Picture Grammars in Classification and Semantic Interpretation of 3D Coronary Vessels Visualisations ............................................................... 96 5.1 Characteristics of the Image Data ...................................................... 96 5.2 Preliminary Analysis of 3D Coronary Vascularisation Reconstructions.................................................................................. 96 5.3 Graph-Based Linguistic Formalisms in Spatial Modelling of Coronary Vessels ............................................................................... 98 5.4 Detecting Lesions and Constructing the Syntactic Analyser ........... 103 5.5 Selected Results ............................................................................... 104 6 Conclusions Concerning the Advanced Classification and Cognitive Interpretation of CT Coronary Vessel Visualizations............................. 108 References .................................................................................................... 110 Chapter 7 A Graph Matching Approach to Symmetry Detection and Analysis ............113 Michael Chertok and Yosi Keller 1 Introduction ..............................................................................................113 2 Symmetries and Their Properties..............................................................115 2.1 Rotational Symmetry ........................................................................115 2.2 Reflectional Symmetry .....................................................................116 2.3 Interrelations between Rotational and Reflectional Symmetries ......117 2.4 Discussion.........................................................................................117 3 Previous Work ..........................................................................................117 3.1 Previous Work in Symmetry Detection and Analysis.......................118 3.2 Local Features...................................................................................121 3.3 Spectral Matching of Sets of Points in Rn .........................................122 4 Spectral Symmetry Analysis.....................................................................123 4.1 Spectral Symmetry Analysis of Sets in Rn .......................................123 4.1.1 Perfect Symmetry and Spectral Degeneracy..........................124 4.2 Spectral Symmetry Analysis of Images ............................................124 4.2.1 Image Representation by Local Features ...............................125 4.2.2 Symmetry Categorization and Pruning ..................................125 4.2.3 Computing the Geometrical Properties of the Symmetry ......126 5 Experimental Results ................................................................................127 5.1 Symmetry Analysis of Images ..........................................................128 5.2 Statistical Accuracy Analysis ...........................................................134

X

Contents

5.3 Analysis of Three-Dimensional Symmetry.......................................136 5.4 Implementation Issues ......................................................................137 5.5 Additional Results ............................................................................140 6 Conclusions ..............................................................................................140 References .....................................................................................................142 Chapter 8 Pattern Classification Methods for Analysis and Visualization of Brain Perfusion CT Maps............................................................................................145 Tomasz Hachaj 1 Introduction ..............................................................................................145 2 Interpretation of Perfusion Maps – Long and Short Time Prognosis........148 3 Image Processing and Abnormality Detection..........................................150 4 Image Registration....................................................................................153 4.1 Affine Registration ...........................................................................154 4.2 FFD Registration ..............................................................................154 4.3 Thirion’s Demons Algorithm............................................................154 4.4 Comparison of Registration Algorithms ...........................................155 5 Classification of Detected Abnormalities .................................................158 6 System Validation and Results .................................................................160 7 Data Visualization – Augmented Reality Environment............................162 7.1 Augmented Reality Environment .....................................................163 7.2 Real Time Rendering of 3D Data .....................................................164 7.3 Augmented Desktop - System Performance Test .............................164 8 Summary...................................................................................................167 References .....................................................................................................168 Chapter 9 Inference of Co-occurring Classes: Multi-class and Multi-label Classification ......................................................................................................171 Tal Sobol-Shikler 1 Introduction ..............................................................................................171 2 Applications..............................................................................................172 3 The Classification Process ........................................................................173 4 Data and Annotation .................................................................................175 5 Classification Approaches ........................................................................177 5.1 Binary Classification.........................................................................177 5.2 Multi-class Classification .................................................................177 5.3 Multi-label Classification .................................................................178 6 Multi-class Classification .........................................................................179 6.1 Multiple Binary Classifiers ...............................................................180 6.1.1 One-Against-All Classification..............................................180 6.1.2 One-Against-One (Pair-Wise) Classification.........................180 6.1.3 Combining Binary Classifiers................................................181

Contents

XI

6.2 Direct Multi-class Classification.......................................................181 6.3 Associative Classification.................................................................182 7 Multi-label Classification .........................................................................182 7.1 Semi-supervised (Annotation) Methods ...........................................186 8 Inference of Co-occurring Affective States from Non-verbal Speech ......186 9 Summary...................................................................................................193 References .....................................................................................................193 Author Index ......................................................................................................199

Chapter 1

Recent Advances in Pattern Classification Marek R. Ogiela1 and Lakhmi C. Jain2 1

AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Kraków, Poland e-mail: [email protected] 2 University of South Australia, School of Electrical and Information Engineering, Adelaide, Mawson Lakes Campus, South Australia SA 5095, Australia e-mail: [email protected]

Abstract. This chapter describes some advances in modern pattern classification techniques, and new classes of information systems dedicated for image analysis, interpretation and semantic classification. In this book we present some new solutions for the development of modern pattern recognition techniques for processing and analysis of several classes of visual patterns, as well as some theoretical foundations for modern pattern interpretation approaches. In particular this monograph presents selected areas of application of pattern recognition and classification approaches including handwriting recognition, medical image analysis and interpretation, development of cognitive systems for image computer understanding, moving object detection, advanced image filtration and intelligent multi-object labeling and classification.

1 New Directions in Pattern Classification In the field of advanced pattern recognition and computational intelligence methods, new directions in the field referred to advanced visual patterns analysis, recognition and interpretation, strongly connected with computational cognitive science or cognitive informatics has recently been distinguished. Computational cognitive science is a new branch of computer science and pattern classification originating mainly from neurobiology and psychology, but is currently also developed by science (e.g. descriptive mathematics) and technical disciplines (informatics). In this science, models of the cognitive process taking place in the human brain [2], which is studied by neurophysiologists (at the level of biological mechanisms), psychologists (at the level of analysing specific human behaviours) and philosophers (at the level of a general reflection on the nature of cognitive processes and their conditions), have become the basis for designing various types of intelligent computer systems.

M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 1–4. springerlink.com © Springer-Verlag Berlin Heidelberg 2012

2

M.R. Ogiela and L.C. Jain

The requirements of an advanced user are not limited to just collecting, processing and analysing information in computer systems. Today, users expect IT systems to offer capabilities of automatically penetrating the semantic layer as well, as this is the source for developing knowledge, and not just only collecting messages. This is particularly true of information systems or decision support systems. Consequently, IT systems based on cognition will certainly be developed intensively, as they meet the growing demands of the Information Society, in which the ability to reach the contents of information collected in computer systems will be gaining increasing importance. In particular, due to the development of system which, apart from numerical data and text, also collect multimedia information, and particularly images or movies, there is a growing need to develop scientific cornerstones for designing IT systems allowing one to easily find the requisite multimedia information, which conveys a specific meaning in its structure, but which requires the semantic contents on an image to be understood, and not just the objects visible in it to be analysed and possibly classified according to their form. Such systems, capable of not only analysing but also interpreting the meaning of the data they process (scenes, real-life contexts, movies etc.), can also play the role of advisory systems supporting human decision-making, whereas the effectiveness of this support can be significantly enhanced by the system automatically acquiring knowledge adequate for the problem in question.

Fig. 1 Taxonomy of issues expplored by cognitive science

It is thus obvious that contemporary solutions should aim at the development of new classes of information systems which can be assigned the new name of Cognitive Information Systems. We are talking about systems which can process data at a very high level of abstraction and make semantic evaluations of such data. Such systems should also have autonomous learning capabilities, which will allow them to improve along with the extension of the knowledge available to them, presented in the form of various patterns and data. Such systems are significantly more complex in terms of the functions they perform than solutions currently employed in practice, so they have to be designed with the use of advanced

Recent Advances in Pattern Classification

3

achievements of computer technologies. What is more, such systems do not fit the theoretical frameworks of today's information collection and searching systems, so when undertaking the development and practical implementation of Cognitive Information Systems, the first task is to find, develop and research new theoretical formalisms adequate for the jobs given to these systems. They will use the theoretical basis and conceptual formalisms developed for cognitive science by physiology, psychology and philosophy (see Fig. 1), but they have to adjusted to the new situation, namely the intentional initiation of cognitive processes in technological systems. Informatics has already attempted to create formalisms for simpler information systems on this basis [2, 5]. In addition, elements of a cognitive approach are increasingly frequently cropping up in the structure of newgeneration pattern classification systems [3, 6], although the adequate terminology is not always used. On the other hand, some researchers believe that the cognitive domain can be conquered by IT systems just as the researchers of simple perception and classification mechanisms have managed to transplant selected biological observations into the technological domain, namely into artificial neural networks [4]. However, the authors have major doubts whether this route will be productive and efficient, as there is a huge difference in scale between neurobiological processes which are mapped by neural networks and mental processes which should be deployed in cognitive information systems or cognitive pattern recognition approaches. The reason is that whereas neural networks are based on the action of neurons numbering from several to several thousand (at the most), mental processes involve hundreds of millions of neurons in the brain, which is a significant hindrance in any attempt to imitate them with computers. This is why it seems appropriate and right to try to base the design of future Cognitive Information Systems on attempts at the behavioural modelling of psychological phenomena and not on the structural imitation of neurophysiological processes. The general foundations for the design of such systems have been the subject of earlier publications [6, 7, 8]. However, it must be said that the methodology of designing universal systems of cognitive interpretation has yet to be developed fully. This applies in particular to systems oriented towards the cognitive analysis of multimedia information. Overcoming the barrier between the form of multimedia information (e.g. the shape of objects in the picture or the tones of sounds) and the sense implicitly contained in this information requires more research initially oriented towards detailed goals. Possibly, after some time, it will be possible to aggregate the experience gained while executing these individual, detailed jobs into a comprehensive, consistent methodology. However, for the time being, we have to satisfy ourselves with achieving individual goals one after another. These goals are mainly about moving away from the analysis of data describing single objects to a more general and semantically deepened analysis of data presenting or describing various components of images or different images from the same video sequence. Some good examples of such visual data analysis will be presented in following chapters.

4

M.R. Ogiela and L.C. Jain

References 1. Bichindaritz, I., Vaidya, S., Jain, A., Jain, L.C. (eds.): Computational Intelligence in Healthcare 4. SCI, vol. 309, pp. 347–369. Springer, Heidelberg (2010) 2. Branquinho, J. (ed.): The Foundations of Cognitive Science. Clarendon Press, Oxford (2001) 3. Davis, L.S. (ed.): Foundations of Image Understanding. Kluwer Academic Publishers (2001) 4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. A WileyInterscience Publication John Wiley & Sons, Inc. (2001) 5. Meystel, A.M., Albus, J.S.: Intelligent Systems – Architecture, Design, and Control. A Wiley-Interscience Publication John Wiley & Sons, Inc., Canada (2002) 6. Ogiela, L., Ogiela, M.R.: Cognitive Techniques in Visual Data Interpretation. Springer, Heidelberg (2009) 7. Ogiela, M.R., Tadeusiewicz, R.: Modern Computational Intelligence Methods for the Interpretation of Medical Images. Springer, Heidelberg (2008) 8. Ogiela, M.R., Ogiela, L.: Cognitive Informatics in Medical Image Semantic Content Understanding. In: Kim, T.-H., Stoica, A., Chang, R.-S. (eds.) Security-Enriched Urban Computing and Smart Grid. CCIS, vol. 78, pp. 131–138. Springer, Heidelberg (2010) 9. Tolk, A., Jain, L.C.: Intelligence-Based Systems Engineering. Intelligence Systems Reference Library 10 (2011) 10. Vernon, D., Metta, G., Sandini, G.: A survey of artificial cognitive systems: Implications for the autonomous development of mental capabilities in computational agents. IEEE Transactions on Evolutionary Computation 11(2), 151–180 (2007)

Chapter 2 Neural Networks for Handwriting Recognition Marcus Liwicki1, Alex Graves2, and Horst Bunke3 1

German Research Center for Artificial Intelligence, Trippstadter Str. 122, 67663 Kaiserslautern, Germany e-mail: [email protected] 2 Institute for Informatics 6, Technical University of Munich, Boltzmannstr. 3, 85748 Garching bei München, Germany e-mail: [email protected] 3 Institute for Computer Science and Applied Mathematics, Neubrückstr. 10, 3012 Bern, Switzerland e-mail: [email protected]

Abstract. In this chapter a novel kind of Recurrent Neural Networks (RNNs) is described. Bi- and Multidimensional RNNs combined with Connectionist Temporal Classification allow for a direct recognition of raw stroke data or raw pixel data. In general, recognizing lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to assimilate context information, has led to low recognition rates for even the best current recognizers. Most recent progress in the field has been made either through improved preprocessing, or through advances in language modeling. Relatively little work has been done on the basic recognition algorithms. Indeed, most systems rely on the same hidden Markov models that have been used for decades in speech and handwriting recognition, despite their wellknown shortcomings. This chapter describes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range, bidirectional or multidirectional interdependencies. In experiments on two unconstrained handwriting databases, the new approach achieves word recognition accuracies of 79,7% on on-line data and 74,1% on off-line data, significantly outperforming a state-of-the-art HMM-based system. Promising experimental results on various other datasets from different countries are also presented. A toolkit implementing the networks is freely available for public.

1 Introduction Handwriting recognition is traditionally divided into on-line and off-line recognition. In on-line recognition a time ordered sequence of coordinates, representing M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 5–24. springerlink.com © Springer-Verlag Berlin Heidelberg 2012

6

M. Liwicki, A. Graves, and H. Bunke

the movement of the pen-tip, is captured, while in the off-line case only an image of the text is available. Because of the greater ease of extracting relevant features, online recognition generally yields better results [1]. Another crucial division is that between the recognition of isolated characters or words, and the recognition of whole lines of text. Unsurprisingly, the latter is substantially harder, and the excellent results that have been obtained for e.g. digit and character recognition [2], [3] have never been matched for complete lines. Lastly, handwriting recognition can be split into cases where the writing style is constrained in some way—for example, only hand printed characters are allowed—and the more challenging scenario where it is unconstrained. Despite more than 40 years of handwriting recognition research [2], [3], [4], [5], developing a reliable, general-purpose system for unconstrained text line recognition remains an open problem.

1.1 State-of-the-Art A well known testbed for isolated handwritten character recognition is the UNIPEN database [6]. Systems that have been found to perform well on UNIPEN include: a writer-independent approach based on hidden Markov models [7]; a hybrid technique called cluster generative statistical dynamic time warping (CSDTW) [8], which combines dynamic time warping with HMMs and embeds clustering and statistical sequence modeling in a single feature space; and a support vector machine with a novel Gaussian dynamic time warping kernel [9]. Typical error rates on UNIPEN range from 3% for digit recognition, to about 10% for lower case character recognition. Similar techniques can be used to classify isolated words, and this has given good results for small vocabularies (e.g., a writer dependent word error rate of about 4.5% for 32 words [10]). However an obvious drawback of whole word classification is that it does not scale to large vocabularies. For large vocabulary recognition tasks, such as those considered in this chapter, the usual approach is to recognize individual characters and map them onto complete words using a dictionary. Naively, we could do this by presegmenting words into characters and classifying each segment. However, segmentation is difficult for cursive or unconstrained text, unless the words have already been recognized. This creates a circular dependency between segmentation and recognition that is often referred to as Sayre’s paradox [11]. Nonetheless, approaches have been proposed where segmentation is carried out before recognition. Some techniques for character segmentation, based on unsupervised learning and data-driven methods, are given in [3]. Other strategies first segment the text into basic strokes, rather than characters. The stroke boundaries may be defined in various ways, such as the minima of the velocity, the minima of the y-coordinates, or the points of maximum curvature. For example, one online approach first segments the data at the minima of the y-coordinates then applies self-organizing maps [12]. Another, offline, approach [13] uses the minima of the vertical histogram for an initial estimation of the character boundaries and then applies various heuristics to improve the segmentation.

Neural Networks for Handwriting Recognition

7

A more satisfactory solution to Sayre’s paradox would be to segment and recognize at the same time. Hidden Markov models (HMMs) are able to do this, which is one reason for their popularity for unconstrained handwriting [14], [15], [16], [17], [18], [19]. The idea of applying HMMs to handwriting recognition was originally motivated by their success in speech recognition [20], where a similar conflict exists between recognition and segmentation. Over the years, numerous refinements of the basic HMM approach have been proposed, such as the writer independent system considered in [7], which combines point oriented and stroke oriented input features. However, HMMs have several well-known drawbacks. One of these is that they assume the probability of each observation depends only on the current state, which makes contextual effects difficult to model. Another is that HMMs are generative, when discriminative models generally give better performance labeling and classification tasks. Recurrent neural networks (RNNs) do not suffer from these limitations, and would therefore seem a promising alternative to HMMs. However the application of RNNs alone to handwriting recognition have so far been limited to isolated character recognition (e.g. [21]). One reason for this is that the traditional neural network objective functions require a separate training signal for every point in the input sequence, which in turn requires presegmented data. A more successful use of neural networks for handwriting recognition has been to combine them with HMMs in the so-called hybrid approach [22], [23]. A variety of network architectures have been tried for hybrid handwriting recognition, including multilayer perceptrons [24], [25], time delay neural networks (TDNNs) [18], [26], [27], and RNNs [28], [29], [30]. However, although hybrid models alleviate the difficulty of introducing context to HMMs, they still suffer from many of the drawbacks of HMMs, and they do not realize the full potential of RNNs for sequence modeling.

1.2 Contribution This chapter describes a recently introduced alternative approach, in which a single RNN is trained directly for sequence labeling. The network uses the connectionist temporal classification (CTC) combined with bidirectional Long Short-Term Memory (BLSTM) architecture, which provides access to long range input context in both directions. A further enhancement which allows the network to work in multiple dimensions will be presented in this chapter. The so-called Multidimemsional LSTM (MDLSTM) is very successful even on raw pixel data. The rest of this Chapter is organized as follows. Section 2 presents the handwritten data and the feature extraction techniques. Subsequently, Section 3 describes the novel neural network classifier. Experimental results are presented in Section 4. Finally, Section 5 concludes this chapter.

8

M. Liwicki, A. Graves, and H. Bunke

Fig. 1 Processing steps of the handwriting recognition system

2 Data Processing As stated above, handwritten data can be acquired for two formats, online and offline format. In this section typical preprocessing and feature extraction techniques are presented. Those techniques have been applied for our experiments.

Neural Networks for Handwriting Recognition

9

The online and offline databases used are the IAM-OnDB [31] and the IAMDB [32] respectively. Note that these do not correspond to the same handwriting samples: the IAM-OnDB was acquired from a whiteboard, while the IAM-DB consists of scanned images of handwritten forms.1

2.1 General Processing Steps A recognition system for unconstrained Roman script is usually divided into consecutive units which iteratively process the handwritten input data to finally obtain the transcription. The main units are illustrated in Fig. 1 and summarized in this section. Certainly, there are differences between offline and online processing, but the principles are the same. Only the methodology for performing the individual steps differs. First, preprocessing steps are applied to reduce noise in the raw data. The input is raw handwritten data and the output usually consists of extracted text lines. The amount of effort that need to be invested into the preprocessing depends on the given data. If the data have been acquired from a system that does not produce any noise and only single words have been recorded, there is nothing to do in this step. But usually the data contains noise which need to be removed to improve the quality of the handwriting, e.g., by means of image enhancement. The offline images are furthermore binarized and the online data, which usually contain noisy points and gaps within strokes, is processed with some heuristics to recover from these artifacts. These operations are described in Ref. [33]. The cleaned text data is then automatically divided into lines using some simple heuristics. Next, the data is normalized, i.e., it is attempted to remove writer-specific characteristics of the handwriting to make writings from different authors looking more similar to each other. This is a very important step in any handwriting recognition system, because the writing styles of the writers differ with respect to skew, slant, height, and width of the characters. In the literature there is no standard way of normalizing the data, but many systems use similar techniques. First, the text line is corrected in regard to its skew, i.e., it is rotated, so that the baseline is parallel to the x-axis. Then, slant correction is performed so that the slant becomes upright. The next important step is the computation of the baseline and the corpus line. These two lines divide the text into three areas: the upper area, which mainly contains the ascenders of the letters; the middle area, where the corpus of the letters is present; and the lower area with the descenders of some letters. These three areas are normalized to predefined heights. Often, some additional normalization steps are performed, depending on the domain. In offline recognition, thinning and binarization may be applied. In online recognition the delayed strokes, e.g., the crossing of a “t” or the dot of an “i”, are usually removed, and equidistant resampling is applied. 1

The databases and benchmark tasks are available on http://www.iam.unibe.ch/fki/databases

10

M. Liwicki, A. Graves, and H. Bunke

Subsequently, features are extracted from the normalized data. This particular step is needed because the recognizers need numerical data as their input. However, no standard method for computing the features exists in the literature. One common method in offline recognition of handwritten text lines is the use of a sliding window moving in the writing direction over the text. Features are extracted at every window position, resulting in a sequence of feature vectors. In the case of online recognition the points are already available in a time-ordered sequence, which makes it easier to get a sequence of feature vectors in writing order. If there is a fixed size of the input pattern, such as in character or word recognition, one feature vector of a constant size can be extracted for each pattern.

Fig. 2 Features of the vicinity

2.2 Our Online System In the system described in this chapter state-of-the-art feature extraction methods are applied to extract the features from the preprocessed data. The feature set input to the online recognizer consists of 25 features which utilize information from both the real online data stored in XML format, and pseudo offline information automatically generated from the online data. For each (x, y)-coordinate recorded by the acquisition device a set of 25 features are extracted, resulting in a sequence of 25-dimensional vectors for each given text line. These features can be divided into two classes. The first class consists of features extracted for each point by considering the neighbors with respect to time. The second class takes the offline matrix representation into account, i.e., it is based on spatial information. The features of the first class are the following: • pen-up/pen-down: a boolean variable indicating whether the pen-tip touches the board or not. Consecutive strokes are connected with straight lines for which this feature has the value false.

Neural Networks for Handwriting Recognition

11

• hat-feature: this binary feature indicates whether a delayed stroke has been removed at the same horizontal position as the considered point. • speed: the velocity is computed before resampling and then interpolated. • x-coordinate: the x-position is taken after high-pass filtering, i.e., after subtracting a moving average from the real horizontal position. • y-coordinate: this feature represents the vertical position of the point after normalization. • writing direction: here we have a pair of features, given by the cosine and sine of the angle between the line segment starting at the point and the x-axis. • curvature: similarly to the writing direction, this is a pair of features, given by the cosine and sine of the angle between the lines to the previous and the next point. • vicinity aspect: this feature is equal to the aspect of the trajectory (See Fig. 2) • vicinity slope: this pair of features is given by the cosine and sine of the angle of the straight line from the first to the last vicinity point (see Fig. 2). • vicinity curliness: this feature is defined as the length of the trajectory in the vicinity divided by max(x(t); y(t)) (see Fig. 2). • vicinity linearity: here we use the average squared distance d² of each point in the vicinity to the straight line from the first to the last vicinity point (see Fig. 2).

Fig. 3 Pseudo offline features

The features of the second class are all computed using a two-dimensional matrix B representing the offline version of the data. For each position the number of points on the trajectory of the strokes is stored. This can be seen as a lowresolution image of the handwritten data. The following features are used: • ascenders/descenders: these two features count the number of points above the corpus line (ascenders) and below the baseline (descenders). Only points which

12

M. Liwicki, A. Graves, and H. Bunke

have an x-coordinate in the vicinity of the current point are considered. Additionally the points must have a minimal distance to the lines to be considered as part of an ascender or descender. The corresponding distances are set to a predefined fraction of the corpus height. • context map: the two-dimensional vicinity of the current point is transformed to a 3×3 map. The number of black points in each region is taken as a feature value. So we obtain altogether nine features of this type.

2.3 Our Offline System To extract the feature vectors from the offline images, a sliding window approach is used. The width of the window is one pixel, and nine geometrical features are computed at each window position. Each text line image is therefore converted to a sequence of 9-dimensional vectors. The nine features are as follows: • • • • • •

The mean gray value of the pixels The center of gravity of the pixels The second order vertical moment of the center of gravity The positions of the uppermost and lowermost black pixels The rate of change of these positions (with respect to the neighboring windows) The number of black-white transitions between the uppermost and lowermost pixels • The proportion of black pixels between the uppermost and lowermost pixels. For a more detailed description of the offline features, see [17]. In the next phase indicated in Fig. 1, a classification system is applied which generates a list of candidates or even a recognition lattice. This step and the last step, the postprocessing, are described in the next section.

3 Neural Network Based Recognition The main focus of this chapter is the recently introduced Neural Network classifier based on CTC combined with Bidirectional or Multidimensional LSTM. This Section describes the different aspects of the architecture and gives brief insights into the algorithms behind.

3.1 Recurrent Neural Networks (RNNs) Recurrent neural networks (RNNs) are a connectionist model containing a selfconnected hidden layer. One benefit of the recurrent connection is that a `memory' of previous inputs remains in the network's internal state, allowing it to make use

Neural Networks for Handwriting Recognition

13

of past context. Context plays an important role in handwriting recognition, as illustrated in Figure 4. Another important advantage of recurrency is that the rate of change of the internal state can be finely modulated by the recurrent weights, which builds in robustness to localized distortions of the input data.

Fig. 4 Importance of context. The characters “ur” would be hard to recognize without the context of the word “entourage”.

3.2 Long Short-Term Memory (LSTM) Unfortunately, the range of contextual information that standard RNNs can access is quite limited. The problem is that the influence of a given input on the hidden layer, and therefore on the network output, either decays or blows up exponentially as it cycles around the network's recurrent connections, and is repeatedly scaled by the connection weights. In practice this shortcoming (referred to in the literature as the vanishing gradient problem) makes it hard for an RNN to bridge gaps of more than about 10 time steps between relevant input and target events. Long Short-Term Memory (LSTM) is an RNN architecture specifically designed to address the vanishing gradient problem. An LSTM hidden layer consists of multiple recurrently connected subnets, known as memory blocks. Each block contains a set of internal units, or cells, whose activation is controlled by three multiplicative units: the input gate, forget gate and output gate. Figure 5 provides a detailed illustration of an LSTM memory block with a single cell. The effect of the gates is to allow the cells to store and access information over long periods of time. For example, as long as the input gate remains closed (i.e., has an activation close to 0), the activation of the cell will not be overwritten by the new inputs arriving in the network. Similarly, the cell activation is only available to the rest of the network when the output gate is open, and the cell's recurrent connection is switched on and off by the forget gate.

14

M. Liwicki, A. Graves, and H. Bunke

Fig. 5 LSTM memory block with one cell

The mathematical background of LSTM is described in depth in [40,41,34]. A short description follows. A conventional recurrent multi-layer Perceptron network (MLP) contains a hidden layer where all neurons of the hidden layer are fully connected with all neurons of the same layer (the recurrent connections). The activation of a single cell at the timestamp t is a weighted sum of the inputs xit plus the weighted sum of the outputs of the previous timestamp bh(t-1). This can be expressed as follows (or in a matrix form):

a t =  wi xit +  wh bht −1 =Wi ⋅ X t + Wh ⋅ B t −1 bt = f (a t ) B t = f ( At ) Since the outputs of the previous timestamp are just calculated by the squashing function of the corresponding cell activations, the influence of the network input in the previous time stamp can be considered as smaller, since it has been weighted already a second time. Thus the overall network activation can be roughly rewritten as:

 At = g ( X t ,Wh X t −1 ,Wh X t −2 ,,Wh X 1 ) 2

t

Neural Networks for Handwriting Recognition

15

where Xt is the overall net input at timestamp t and Wh is the weight matrix of the hidden layer. Note that for clarity reasons we use this abbreviated form of the complex formula, where the input weights do not directly appear (all is hidden in the function g(…)). This formula reveals that the influence of earlier time stamps t-n vanishes rapidly, as the time difference n appears in the exponent of the weight matrix. Since all values of the weight matrix Wh are smaller than 1, the n-th power of Wh is close to zero. Introducing the LSTM cell brings in three new cells which all get the weighted sum of the outputs of the hidden layer in the previous timestamp as an input, i.e., for the input gate:

aιt = Wi ,ι ⋅ X t + Wh ,ι ⋅ B t −1 + wc ,ι s tc−1 where sct-1 is the cell state of the previous timestamp and Wi,t and Wh,t are the weights for the current net input and the hidden layer output of the previous timestamp, respectively. The activation of the forget gate is:

aθt = Wi ,θ ⋅ X t + Wh ,θ ⋅ B t −1 + wc ,θ s tc−1 which is the same formula just with other weights (those trained for the forget gate). The cell activation is usually calculated by:

act = Wi ,c ⋅ X t + Wh,c ⋅ B t −1 However, the cell state is then weighted with the outputs of the two gate cells:

sct = σ (aιt ) g (act ) + σ (a θt ) sct −1 where σ indicates that the sigmoid function is used as a squashing function for the gates and g() is cell’s activation function. As the sigmoid function often returns a value close to zero or one, the formula can be interpreted as:

 sct = [0or1]g (act ) + [0or1]sct −1 or in words: the cell state is either depending on the input activation (if the input gates opens, i.e., the first weight is close to 1) or on the previous cell state (if the forget gate opens, i.e., the second weight is close to one). This particular property enables the LSTM-cell to bridge over long time periods. The value of the output gate is calculated similarly to the other gates, i.e.:

aωt = Wi ,ω ⋅ X t + Wh,ω ⋅ B t −1 + wc ,ω s tc and the final cell output is:

bct = σ (aωt )h( sct ) which again is close to zero or the usual output of the cell h(…).

16

M. Liwicki, A. Graves, and H. Bunke

3.3 Bidirectional Recurrent Neural Networks For many tasks it is useful to have access to future as well as past context. In handwriting recognition, for example, the identification of a given letter is helped by knowing the letters both to the right and left of it. Bidirectional Recurrent Neural Networks (BRNNs) [35] are able to access context in both directions along the input sequence. BRNNs contain two separate hidden layers, one of which processes the inputs forwards, while the other processes them backwards. Both hidden layers are connected to the output layer, which therefore has access to all past and future context of every point in the sequence. Combining BRNNs and LSTM gives bidirectional LSTM (BLSTM) [42].

3.4 Connectionist Temporal Classification (CTC) Standard RNN objective functions require a presegmented input sequence with a separate target for every segment. This has limited the applicability of RNNs in domains such as cursive handwriting recognition, where segmentation is difficult to determine. Moreover, because the outputs of a standard RNN are a series of independent, local classifications, some form of post processing is required to transform them into the desired label sequence. Connectionist Temporal Classification (CTC) [36,34] is an RNN output layer specifically designed for sequence labeling tasks. It does not require the data to be presegmented, and it directly outputs a probability distribution over label sequences. CTC has been shown to outperform RNN-HMM hybrids in a speech recognition task [36]. A CTC output layer contains as many units as there are labels in the task, plus an additional ‘blank’ or ‘no label’ unit. The output activations are normalized (using the softmax function), so that they sum to 1 and are each in the range (0; 1): t

y = t k

where

e ak



t

e ak ′ k′

,

a kt is the unsquashed activation of output unit k at time t, and y kt is the ac-

tivation of the same unit after the softmax function is applied. The above activations are used to estimate the conditional probabilities p (k , t | x) of observing the label (or blank) with index k at time t in the input sequence x:

y kt = p (k , t | x)

Neural Networks for Handwriting Recognition

17

The conditional probability p (π | x ) of observing a particular path π through the lattice of label observations is then found by multiplying together the label and blank probabilities at every time step: T

T

t =1

t =1

p (π | x) = ∏ p (π t , t | x) = ∏ yπt t , where

πt

is the label observed at time t along path

π.

≤T

≤T

Paths are mapped onto label sequences l ∈ L , where L denotes the set of all strings on the alphabet L of length ≤ T , by an operator B that removes first the repeated labels, then the blanks. For example, both B ( a,−, a, b,−) and

B (−, a, a,−,−, a, b, b) yield the labeling (a, a, b) . Since the paths are mutually exclusive, the conditional probability of a given labelling probabilities of all the paths corresponding to it:

p (l | x) =

l ∈ L≤T is the sum of the

 p(π | x)

π ∈B −1 ( l )

The above step is what allows the network to be trained with unsegmented data. The intuition is that, because we don’t know where the labels within a particular transcription will occur, we sum over all the places where they could occur. In general, a large number of paths will correspond to the same label sequence, so a naïve calculation of the equation above is unfeasible. However, it can be efficiently evaluated using a graph-based algorithm, similar to the forward-backward algorithm for HMMs. More details about the CTC forward-backward algorithm appear in [39].

3.5 Multidimensional Recurrent Neural Networks Ordinary RNNs are designed for time-series and other data with a single spatiotemporal dimension. However the benefits of RNNs (such as robustness to input distortion, and flexible use of surrounding context) are also advantageous for multidimensional data, such as images and video sequences. Multidimensional recurrent neural networks (MDRNNs) [43, 34], a special case of Directed Acyclic Graph RNNs [44], generalize the basic structure of RNNs to multidimensional data. Rather than having a single recurrent connection, MDRNNs have as many recurrent connections as there are spatio-temporal dimensions in the data. This allows them to access previous context information along all input directions. Multidirectional MDRNNs are the generalization of bidirectional RNNs to multiple dimensions. For an n-dimensional data sequence, 2n different hidden layers are used to scan through the data in all directions. As with bidirectional RNNs, all

18

M. Liwicki, A. Graves, and H. Bunke

the layers are connected to a single output layer, which therefore has access to context information in both directions along all dimensions. Multidimensional LSTM (MDLSTM) is the generalization of bidirectional LSTM to multidimensional data.

3.6 Hierarchical Subsampling Recurrent Neural Networks Hierarchical subsampling is a common technique in computer vision [45] and other domains with large input spaces. The basic principle is to iteratively rerepresent the data at progressively lower resolutions, using a hierarchy of feature extractors. The features extracted at each level are subsampled and used as input to the next level. The number and complexity of the features typically increases as one climbs the hierarchy. This is much more efficient for high-resolution data than a single `flat’ feature extractor, since most of the computations are carried out on low resolution feature maps, rather than, for example, raw pixels. A well-known connectionist hierarchical subsampling architecture is Convolutional Neural Networks [46]. Hierarchical subsampling is also possible with RNNs, and hierarchies of MDLSTM layers have been applied to offline handwriting recognition [47]. Hierarchical subsampling with LSTM is equally useful for long 1D sequences, such as raw speech data or online handwriting trajectories with a high sampling rate. From the point of view of handwriting recognition, the most interesting aspect of hierarchical subsampling RNNs is that they can be applied directly to the raw input data (offline images or online point-sequences) without any normalization or feature extraction.

4 Experiments The experiments have been performed with the freely available RNNLIB tool by Alex Graves.2 This tool implements the network architecture and furthermore provides examples for the recognition of several scripts.

4.1 Comparison with HMMs on the IAM Databases The aim of the first experiments was to evaluate the performance of the complete RNN handwriting recognition system, illustrated in Figure 6, for both online and offlne handwriting. In particular we wanted to see how it compared to an HMMbased system. The online and offline databases used were the IAM-OnDB and the IAM-DB respectively (see above). Note that these do not correspond to the same handwriting samples: the IAM-OnDB was acquired from a whiteboard, while the IAM-DB consists of scanned images of handwritten forms.

2

http://sourceforge.net/projects/rnnl/

Neural Networks for Handwriting Recognition

19

Fig. 6 Complete RNN handwriting recognition system (here applied to offline Arabic data)

To make the comparisons fair, the same online and offline preprocessing was used for both the HMM and RNN systems. In addition, the same dictionaries and language models were used for the two systems. For all the experiments, the task was to transcribe the text lines in the test set, using the words in the dictionary. The basic performance measure was the word accuracy:

 insertions + substitiutions + delitions   100 ⋅ 1 −  number _ of _ words _ in _ transcription  where the number of word insertions, substitutions and deletions is summed over the whole test set. For the RNN system, we also recorded the character accuracy, defined as above except with characters instead of words.

20

M. Liwicki, A. Graves, and H. Bunke

Table 1 Main results for online data System

Word Accuracy Character Accuracy

HMM 65.0% CTC (BLSTM)79.7%

88.5%

Table 2 Main results for offline data System

Word Accuracy Character Accuracy

HMM 64.5% CTC (BLSTM)74.1%

81.8%

As can be seen from Tables 1 and 2, the RNN substantially outperformed the HMM on both databases. To put these results in perspective, the Microsoft tablet PC handwriting recognizer [37] gave a word accuracy score of 71.32% on the online test set. This result is not directly comparable to our own, since the Microsoft system was trained on a different training set, and uses considerably more sophisticated language modeling than the HMM and RNN systems we implemented. However, it indicates that the RNN-based recognizer is competitive with the best commercial systems for unconstrained handwriting.

4.2 Recognition Performance of MDLSTM on Contest’ Data The MDLSTM system participated in three handwriting recognition contests at the ICDAR 2009 (see the proceedings in [38]). The recognition tasks were based on different scripts. In all cases, the systems had to recognize handwriting from unknown writers. Table 3 Summarized results from the online Arabic handwriting recognition competition System

Word Accuracy Time/Image

REGIM HMM 52.67% Vision Objects 98.99% CTC (BLSTM)95.70%

6402.24 ms 69.41 ms 1377.22 ms

Table 4 Summarized results from the offline Arabic handwriting recognition competition System

Word Accuracy Time/Image

Arab-Reader HMM 76.66% Multi-Stream HMM74.51% CTC (MDLSTM) 81.06%

2583.64 ms 143269.81 ms 371.61 ms

Neural Networks for Handwriting Recognition

21

Table 5 Summarized results from the offline French handwriting recognition competition System

Word Accuracy

HMM+MLP Combination 83.17% Non-Symmetric HMM 83.17 % CTC (MDLSTM) 93.17%

A summary of the results appear in Tables 3-5. As can be seen, the approach described in this chapter always outperformed the other systems in the offline case. This observation is very promising, because the system just uses the 2dimensional raw pixel data as an input. For the online competition (Table 3) a commercial recognizer performed better than the CTC approach. However, if the CTC system would be combined with State-of-the-Art preprocessing and feature extraction methods, it would probably reach a higher performance. This observation has been made in [39], where extended experiments to those in Section 1.4.1 have been performed. Having a look at the calculation time (milliseconds per text line) also reveals very promising results. The MDLSTM combined with CTC was among the fastest recognizers in the competitions. Using some pruning strategies could further increase the recognition speed.

5 Conclusion This chapter described a novel approach for recognizing unconstrained handwritten text, using a recurrent neural network. The key features of the network are the bidirectional Long Short-Term Memory architecture, which provides access to long range, bidirectional contextual information, and the Connectionist Temporal Classification output layer, which allows the network to be trained on unsegmented sequence data. In experiments on online and offline handwriting data, the new approach outperformed state-of-the-art HMM-based classifiers and several other recognizers. We conclude that this system represents a significant advance in the field of unconstrained handwriting recognition, and merits further research. A toolkit implementing the presented architecture is freely available to the public.

References [1] Seiler, R., Schenkel, M., Eggimann, F.: Off-line cursive handwriting recognition compared with on-line recognition. In: ICPR 1996: Proceedings of the International Conference on Pattern Recognition (ICPR 1996), vol. IV-7472, p. 505. IEEE Computer Society, Washington, DC, USA (1996) [2] Tappert, C., Suen, C., Wakahara, T.: The state of the art in online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 12(8), 787–808 (1990)

22

M. Liwicki, A. Graves, and H. Bunke

[3] Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: A comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63–84 (2000) [4] Vinciarelli, A.: A survey on off-line cursive script recognition. Pattern Recognition 35(7), 1433–1446 (2002) [5] Bunke, H.: Recognition of cursive roman handwriting - past present and future. In: Proc. 7th Int. Conf. on Document Analysis and Recognition, vol. 1, pp. 448–459 (2003) [6] Guyon, I., Schomaker, L., Plamondon, R., Liberman, M., Janet, S.: Unipen project of on-line data exchange and recognizer benchmarks. In: Proc. 12th Int. Conf. on Pattern Recognition, pp. 29–33 (1994) [7] Hu, J., Lim, S., Brown, M.: Writer independent on-line handwriting recognition using an HMM approach. Pattern Recognition 33(1), 133–147 (2000) [8] Bahlmann, C., Burkhardt, H.: The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping. IEEE Trans. Pattern Anal. and Mach. Intell. 26(3), 299–310 (2004) [9] Bahlmann, C., Haasdonk, B., Burkhardt, H.: Online handwriting recognition with support vector machines - a kernel approach. In: Proc. 8th Int. Workshop on Frontiers in Handwriting Recognition, pp. 49–54 (2002) [10] Wilfong, G., Sinden, F., Ruedisueli, L.: On-line recognition of handwritten symbols. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(9), 935–940 (1996) [11] Sayre, K.M.: Machine recognition of handwritten words: A project report. Pattern Recognition 5(3), 213–228 (1973) [12] Schomaker, L.: Using stroke- or character-based self-organizing maps in the recognition of on-line, connected cursive script. Pattern Recognition 26(3), 443–450 (1993) [13] Kavallieratou, E., Fakotakis, N., Kokkinakis, G.: An unconstrained handwriting recognition system. Int. Journal on Document Analysis and Recognition 4(4), 226–242 (2002) [14] Bercu, S., Lorette, G.: On-line handwritten word recognition: An approach based on hidden Markov models. In: Proc. 3rd Int. Workshop on Frontiers in Handwriting Recognition, pp. 385–390 (1993) [15] Starner, T., Makhoul, J., Schwartz, R., Chou, G.: Online cursive handwriting recognition using speech recognition techniques. In: Int. Conf. on Acoustics, Speech and Signal Processing, vol. 5, pp. 125–128 (1994) [16] Hu, J., Brown, M., Turin, W.: HMM based online handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 18(10), 1039–1045 (1996) [17] Marti, U.-V., Bunke, H.: Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Int. Journal of Pattern Recognition and Artificial Intelligence 15, 65–90 (2001) [18] Schenkel, M., Guyon, I., Henderson, D.: On-line cursive script recognition using time delay neural networks and hidden Markov models. Machine Vision and Applications 8, 215–223 (1995) [19] El-Yacoubi, A., Gilloux, M., Sabourin, R., Suen, C.: An HMM-based approach for off-line unconstrained handwritten word modeling and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(8), 752–760 (1999) [20] Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. of the IEEE 77(2), 257–286 (1989)

Neural Networks for Handwriting Recognition

23

[21] Bourbakis, N.G.: Handwriting recognition using a reduced character method and neural nets. In: Proc. SPIE Nonlinear Image Processing VI, vol. 2424, pp. 592–601 (1995) [22] Bourlard, H., Morgan, N.: Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers (1994) [23] Bengio, Y.: Markovian models for sequential data. Neural Computing Surveys 2, 129–162 (1999) [24] Brakensiek, A., Kosmala, A., Willett, D., Wang, W., Rigoll, G.: Performance evaluation of a new hybrid modeling technique for handwriting recognition using identical on-line and off-line data. In: Proc. 5th Int. Conf. on Document Analysis and Recognition, Bangalore, India, pp. 446–449 (1999) [25] Marukatat, S., Artires, T., Dorizzi, B., Gallinari, P.: Sentence recognition through hybrid neuro-markovian modelling. In: Proc. 6th Int. Conf. on Document Analysis and Recognition, pp. 731–735 (2001) [26] Jaeger, S., Manke, S., Reichert, J., Waibel, A.: Online handwriting recognition: the NPen++ recognizer. Int. Journal on Document Analysis and Recognition 3(3), 169–180 (2001) [27] Caillault, E., Viard-Gaudin, C., Ahmad, A.R.: MS-TDNN with global discriminant trainings. In: Proc. 8th Int. Conf. on Document Analysis and Recognition, pp. 856– 861 (2005) [28] Senior, A.W., Fallside, F.: An off-line cursive script recognition system using recurrent error propagation networks. In: International Workshop on Frontiers in Handwriting Recognition, Buffalo, NY, USA, pp. 132–141 (1993) [29] Senior, A.W., Robinson, A.J.: An off-line cursive handwriting recognition system. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(3), 309–321 (1998) [30] Schenk, J., Rigoll, G.: Novel hybrid NN/HMM modelling techniques for on-line handwriting recognition. In: Proc. 10th Int. Workshop on Frontiers in Handwriting Recognition, pp. 619–623 (2006) [31] IAM-OnDB an on-line English sentence database acquired from handwritten text on a whiteboard, In: Proc. 8th Int. Conf. on Document Analysis and Recognition, pp. 956–961 (2005) [32] The IAM-database: an English sentence database for offline handwriting recognition. Int. Journal on Document Analysis and Recognition 5, 39–46 (2002) [33] Liwicki, M., Bunke, H.: Handwriting recognition of whiteboard notes – studying the influence of training set size and type. Int. Journal of Pattern Recognition and Artificial Intelligence 21(1), 83–98 (2007) [34] Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks., Ph.D. thesis, Technical University of Munich (2008) [35] Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45, 2673–2681 (1997) [36] Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labeling unsegmented sequence data with recurrent neural networks. In: Proc. Int. Conf. on Machine Learning, pp. 369–376 (2006) [37] Pitman, J.A.: Handwriting recognition: Tablet pc text input. Computer 40(9), 49–54 (2007) [38] Proc. 10th Int. Conf. on Document Analysis and Recognition (2009)

24

M. Liwicki, A. Graves, and H. Bunke

[39] Graves, A., Liwicki, M., Fernández, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(5), 855–868 (2009) [40] Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. Neural Computation 9(8), 1735–1780 (1997) [41] Gers, F.: Long Short-Term Memory in Recurrent Neural Networks. Ph.D.thesis, EPFL (2001) [42] Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18(5-6), 602–610 (2005) [43] Graves, A., Fernández, S., Schmidhuber, J.: Multidimensional recurrent neural networks. In: Proc. Int. Conf. on Artificial Neural Networks (2007) [44] Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures–DAG-RNNs and the protein structure prediction problem. J. Mach. Learn. Res. 4, 575–602 (2003) [45] Reisenhuber, M., Poggio, T.: Hierarchical models of object recognition in cortex. NatureNeuroscience 2(11), 1019–1025 (1999) [46] LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998) [47] Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. Advances in Neural Information Processing Systems 21, 545–552 (2009)

Chapter 3

Moving Object Detection from Mobile Platforms Using Stereo Data Registration Angel D. Sappa1 , David Ger´onimo1,2, Fadi Dornaika3,4, Mohammad Rouhani1, and Antonio M. L´opez1,2 1 2 3 4

Computer Vision Center Universitat Aut`onoma de Barcelona, 08193, Bellaterra, Barcelona, Spain Computer Science Department Universitat Aut`onoma de Barcelona, 08193, Bellaterra, Barcelona, Spain University of the Basque Country, San Sebastian, Spain IKERBASQUE, Basque Foundation for Science, Bilbao, Spain {asappa,dgeronimo,rouhani,antonio}@cvc.uab.es, fadi [email protected]

Abstract. This chapter describes a robust approach for detecting moving objects from on-board stereo vision systems. It relies on a feature point quaternion-based registration, which avoids common problems that appear when computationally expensive iterative-based algorithms are used on dynamic environments. The proposed approach consists of three main stages. Initially, feature points are extracted and tracked through consecutive 2D frames. Then, a RANSAC based approach is used for registering two point sets, with known correspondences in the 3D space. The computed 3D rigid displacement is used to map two consecutive 3D point clouds into the same coordinate system by means of the quaternion method. Finally, moving objects correspond to those areas with large 3D registration errors. Experimental results show the viability of the proposed approach to detect moving objects like vehicles or pedestrians in different urban scenarios.

1 Introduction The detection of moving objects in dynamic environments is generally tackled by first modelling the background. Then, foreground objects are directly obtained by performing an image subtraction (e.g., [14], [15], [32]). An extensive survey on motion detection algorithms can be found in [21]. In general, most of the approaches assume stationary cameras, which means all frames are registered in the same coordinate system. However, when the camera moves, the problem becomes intricate since it is unfeasible to have a unique background model. In such a case, moving object detection is generally tackled by compensating the camera motion so that all frames from a given video sequence, obtained from a moving camera/platform, are referred to the same reference system (e.g., [7], [27]). Moving object detection from a moving camera is a challenging problem in computer vision, having a number of applications in different domains: mobile robots M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 25–37. c Springer-Verlag Berlin Heidelberg 2012 springerlink.com 

26

A.D. Sappa et al.

[26]; aerial surveillance [35] [34]; video segmentation [1]; vehicles and driver assistance [15], [24]; just to mention a few. As mentioned above, the underlying strategy in the solutions proposed in the literature essentially relies on the compensation of the camera motion. The difference between them lie on the sensor (i.e., monocular/stereoscopic) or on the use of prior-knowledge of the scene together with visual cues. For instance, [26] uses a stereo system and predicts the depth image for the current time by using ego-motion information and the depth image obtained at the previous time. Then, moving objects are easily detected by comparing the predicted depth image with the one obtained at the current time. The prior-knowledge of the scene is also used in [35] and [34]. In these cases the authors assume that the scene is far from the camera (monocular) and the depth variation of the objects of interest is small compared to the distance (e.g., airborne image sequences). In this context camera motion can be approximately compensated by a 2D parametric transformation (a 3x3 homography). Hence, motion compensation is achieved by warping a sequence of frames to a reference frame, where moving objects are easily detected by image subtraction like in the stationary camera cases. A more general approach has been proposed in [1] for segmenting videos captured with a freely moving camera, which is based on recording complex background and large moving non-rigid foreground objects. The authors propose a region-based motion compensation. It estimates the motion of the camera by finding the correspondence of a set of salient regions obtained by segmenting successive frames. In the vehicle on-board vision systems and driver assistance fields, the compensation of camera motion has also attracted researchers’ attention in recent years. For instance, in [15] the authors present a simple but effective approach based on the use of GPS information to roughly align frames from video sequences. A local appearance comparison between the aligned frames is used to detect objects. In the driver assistance context, but by using an onboard stereo rig, [24] introduce a 3D data registration based approach to compensate camera motion from two consecutive frames. In that work, consecutive stereo frames are aligned into the same coordinate system; then moving objects are obtained from a 3D frame subtraction, similar to [26]. The current chapter proposes an extension of [24], by detecting misregistration regions according to an adaptive threshold from the depth information. The remainder of this chapter is organized as follows. Section 2 introduces related work in the 3D data registration problem. Then, Section 3 presents the proposed approach for moving object detection. It consists of three stages: i) 2D feature point detection and tracking; ii) robust 3D data registration; and iii) moving object detection through consecutive stereo frame subtraction. Experimental results in real environments are presented in Section 4. Finally, conclusions and future works are given in Section 5.

2 Related Work A large number of approaches have been proposed in the computer vision community for 3D Point registration during the last two decades (e.g., [3], [4], [22]). 3D

Moving Object Detection from Mobile Platforms Using Stereo Data Registration

27

data point registration aims at finding the best transformation that places both the given data set and corresponding model set into the same reference system. The different approaches proposed in the literature can be broadly classified into two categories, depending on whether an initial information is required (fine registration) or not (coarse registration); a comprehensive survey of registration methods can be found in [23]. The approach followed in the current work for moving object detection lies within the fine rigid registration category. Typically, the fine registration process consists in iterating the following two stages. Firstly, the correspondence between every point from the current data set and the model set shall be found. These correspondences are used to define the residual of the registration. Secondly, the best set of parameters that minimizes the accumulated residual shall be computed. These two stages are iteratively applied until convergence is reached. The Iterative Closest Point (ICP)—originally introduced by [3] and [4]—is one of the most widely used registration techniques using this two-stage scheme. Since then, several variations and improvements have been proposed in order to increase the efficiency and robustness (e.g., [25], [8], [5]). In order to avoid the point-wise nature of ICP, which makes the problem discrete and non-smooth, different techniques have been proposed: i) probabilistic representations are used to describe both data and model set (e.g. [31], [13]); ii) in [8] the point-wise problem is avoided by using a distance field of the model set; iii) an implicit polynomial (IP) is used in [36] to fit the distance field, which later defines a gradient field leading the data points towards that model set; iv) implicit polynomials have been also used in [28] to represent both the data set and model set. In this case, an accurate pose estimation is computed based on the information from the polynomial coefficients. Probabilistic-based approaches avoid the point-wise correspondence problem by representing each set by a mixture of Gaussians (e.g., [13], [6]); hence, registration becomes a problem of aligning two mixtures. In [13] a closed-form expression for the L2 distance between two Gaussian mixtures is proposed. Instead of Gaussian mixture models, [31] proposes an approach based on multivariate t-distributions, which is robust to large number of missing values. Both approaches, as all mixture models, are highly dependent on the number of mixtures used for modelling the sets. This problem is generally solved by assuming a user defined number of mixtures or as many as the number of points. The former one needs the points to be clustered, while the latter one results in a very expensive optimization problem that cannot handle large data sets or could get trapped in local minimum when complex sets are considered. The non-differentiable nature of ICP is overcome by using a derivable distance transform—Chamfer distance—in [8]. A non-linear minimization (Levenberg Marquardt algorithm) of the error function, based on that distance transform, is used for finding the optimal registration parameters. The main disadvantage of [8] is the precision dependency on the grid resolution, where the Chamfer distance transform and discrete derivatives are evaluated. Hence, this technique cannot be directly applied when the point set is sparse or unorganized.

28

A.D. Sappa et al.

On the contrary to the previous approaches, [36] proposes a fast registration method based on solving an energy minimization problem derived from an implicit polynomial fitted to the given model set [37]. This IP is used to define a gradient flow that drives the data set to the model set without using point-wise correspondences. The energy functional is minimized by means of a heuristic two step process. Firstly, every point in the given data set moves freely along the gradient vectors defined by the IP. Secondly, the outcome of the first step is used to define a single transformation that represents this movement in a rigid way. These two steps are repeated alternately until convergence is reached. The weak point of this approach is the first step of the minimization that lets the points move independently in the proposed gradient flow. Furthermore, the proposed gradient flow is not smooth, specially close to the boundaries. Most of the algorithms presented above have been originally proposed for registering overlapped sets of points corresponding to the 3D surface of a single rigid object. Extensions to a more general framework, where the 3D surfaces to be registered correspond to different views of a given scene, have been presented in the robotic field (e.g., [30, 18]). Actually, in all these extensions, the registration is used for the simultaneous localization and mapping (SLAM) of the mobile platform (i.e., the robot). Although some approaches differentiate static and dynamic parts of the environment before registration (e.g., [30], [33]), most of them assume that the environment is static, containing only rigid, non-moving objects. Therefore, if moving objects are present in the scene, the least squares formulation of the problem will provide a rigid transformation biased by the motions in the scene. Independently to the kind of scenario to be tackled (partial view of a single object or whole scene), 3D registration algorithms are computationally expensive, which prevents their use in real time applications. In the current work a robust strategy that reduces the CPU time by focusing only on feature points is proposed. It is intended to be used in ADAS (Advanced Driver Assistance Systems) applications, in which an on-board camera explores the current scene in real time. Usually, an exhaustive window scanning approach is adopted to extract regions of interests (ROIs), needed in pedestrian or vehicle detection systems. The concept of consecutive frame registration for moving object detection has been explored in [11], in which an active frame subtraction for pedestrian detection from images of moving cameras is proposed. In that work, consecutive frames were not registered by a vision based approach but by estimating the relative camera motion using vehicle speed and a gyrosensor. A similar solution has been proposed in [15], but by using GPS information.

3 Proposed Approach The proposed approach combines 2D detection of key points with 3D registration. The first stage consists in extracting a set of 2D feature points at a given frame and track it through the next frame; 3D coordinates corresponding to each of these 2D feature points are later on used during the registration process, where the rigid displacement (six degrees of freedom) that maps the 3D scene associated with frame

Moving Object Detection from Mobile Platforms Using Stereo Data Registration

29

(n) into the 3D scene associated with frame (n + 1) is computed (see Figure 1). This rigid transform represents the 3D motion of the camera between frame (n) and frame (n + 1). Finally, moving objects are detected by computing the difference between the 3D coordinates of points represented in the same coordinate system. Before going into details in the stages of the proposed approach a brief description of the used stereo vision system is given.

3.1 System Setup A commercial stereo vision system (Bumblebee from Point Grey1 ) is used to acquire the 3D information of the scene in front of the host vehicle. It consists of two Sony ICX084 Bayer pattern CCDs with 6mm focal length lenses. Bumblebee is a precalibrated system that does not require in-field calibration. The baseline of the stereo head is 12cm and it is connected to the computer by an IEEE-1394 interface. Right and left color images (Bayer pattern) were captured at a resolution of 640×480 pixels. After capturing each right-left pair of images, a dense cloud of 3D data points Pn is computed by using a 3D reconstruction software at each frame n. The right intensity image In is used during the feature point detection and tracking stage.

3.2 Feature Detection and Tracking As previously mentioned, the proposed approach is intended to be used on on-board vision systems for driver assistance applications. Hence, due to real time constraint, it is clear that the whole cloud of points cannot be used to find the rigid transformation that maps two consecutive frames to the same reference system. In order to tackle this problem, an efficient approach that relies only on the use of a reduced set n ⊂ In , far away of points from the given image In is proposed. Feature points, fi(u,v) n from the camera position (Pi(x,y,z) > δ ) are discarded in order to increase registration accuracy2 (δ = 15 m in the current implementation). The proposed approach does not depend on the technique used for detecting feature points; actually, two different approaches have been tested: one based on the Harris corner points [10] and another on SIFT features [16]. In the first case, once feature points have been selected a tracking window WT of (9×9) pixels is set. Feature points are tracked by minimizing the sum of squared differences between two consecutive frames by using an iterative approach [17]. In the second case SIFT features [16] are detected in the extreme of difference of Gaussians in a scale-space representation and described as histograms of gradient orientations. In this case, following [16], a function based on the corresponding histograms distance is used to match the features in consecutive frames (the public implementation of SIFT in [29] has been used). 1 2

www.ptgrey.com Stereo head data uncertainty grows quadratically with depth [19].

30

A.D. Sappa et al.

50

50

100

100

150

150

200

200

250

250

300

300

350

350

400

400

450

450 100

200

300

400

500

600

100

200

Frame (n)

300

400

500

600

Frame (n + 1)

Frame (n)

Frame (n + 1)

Fig. 1 Feature points detected and tracked through consecutive frames: (top) using Harris corner detector; (bottom) using SIFT detector and descriptor

n+1

n

P1(x,y,z)

P1(x,y,z)

Z

Z X

X

n

P2(x,y,z)

Y n

P3(x,y,z)

[R | t]

n+1

P2(x,y,z)

Y n+1

P3(x,y,z)

Fig. 2 Illustration of feature points represented in the 3D space, together with three couples of points used for computing the 3D rigid displacement: [R|t]—RANSAC-like technique

Moving Object Detection from Mobile Platforms Using Stereo Data Registration

31

3.3 Robust Registration The set of 2D-to-2D point correspondences obtained in the previous stage, is easily converted to a set of 3D-to-3D points since for every frame we have a quasi dense 3D reconstruction that is rapidly provided by Bumblebee. In the current approach, contrary to Iterative Closest Point (ICP) based algorithms, the correspondences between the two point sets are known; hence, the main challenge that should be faced during this stage is the fact that feature points can belong to static or moving objects in the scene. Since the camera is moving there are no additional clues to differentiate them easily. Hence, the use of a robust RANSAC-like technique is proposed to find the best rigid transformation that maps the 3D points of frame (n) into their corresponding in frame (n + 1). The closed-form solution provided by unit quaternions [12] is chosen to compute this 3D rigid displacement, with rotation matrix R and translation vector t between the two sets of vertices. The proposed approach works as follows: Random sampling. Repeat the following three steps K times (in our experiments K was set to 100): n+1 n , Pi(x,y,z) )k , 1. Draw a random subsample of 3 different pairs of feature points (Pi(x,y,z) n+1 n where Pi(x,y,z) ∈ Pn , Pi(x,y,z) ∈ Pn+1 and i = {1, 2, 3}. 2. For this subsample, indexed by k (k = 1, ...., K), compute the 3D rigid displacen+1 n ment Dk = [Rk |tk ] that minimizes the residual error ∑3i=1 |Pi(x,y,z) − Rk Pi(x,y,z) −

tk |2 . This minimization is carried out by using the closed-form solution provided by the unit quaternion method [12]. 3. For this solution Dk , compute the number of inliers among the entire set of pairs of feature points according to a user defined threshold value. Solution 1. Choose the best solution, i.e., the solution that has the highest number of inliers. Let Dq be this solution. 2. Refine the 3D rigid displacement [Rq |tq ] by using the whole set of couples considered as inliers, instead of the corresponding 3 pairs of feature points. A simn+1 |Pi(x,y,z) − ilar unit quaternion representation [2] is used to minimize: ∑#inliers i=1 n − tq |2 . Rq Pi(x,y,z)

3.4 Frame Subtraction The best 3D rigid displacement [Rq |tq ] computed above with inliers 3D feature points is representing the camera motion. Thus, it will be used for detecting moving regions after motion compensation. First, the whole set of 3D data points at frame (n) is mapped by: n+1 n = Rq Pi(x,y,z) + tq , Pi(x,y,z)

(1)

32

A.D. Sappa et al.

Fig. 3 Synthesized views representing frames (n) (from Fig. 1(le f t)) in the coordinate systems of frames (n + 1), by using their corresponding rigid displacements: [Rq |tq ] n+1 where Pi(x,y,z) denotes the mapping of a given point from frame n into the next frame. Note that for static 3D points we ideally have Pn+1 = Pn+1 . i(x,y,z)

i(x,y,z)

Once the whole set of points Pn has been mapped, we can also synthesize the n+1 ) as follows: corresponding 2D view (I(u,v)   xn+1 i = (round) u + f , un+1 0 i  zn+1 i   yn+1 n+1 i , vi = (round) v0 + f n+1  zi

(2)

where f denotes the focal length in pixels, (u0 , v0 ) represents the coordinates of the camera principal point, and ( xn+1 , yn+1 , zn+1 ) correspond to the 3D coordinates of i i i

Fig. 4 (le f t) D(u,v) map of moving regions, from frames (n) and (n + 1) presented in Fig. 1(top). (right) Image difference between these consecutive frames: (|I(n) − I(n+1) |) to illustrate their relative displacement.

Moving Object Detection from Mobile Platforms Using Stereo Data Registration

33

the mapped point (1). Figure 3 shows two synthesized views obtained after mapping frames (n) (Fig. 1(left)) with their corresponding [Rq |tq ]. A moving region map, D(u,v) , is then computed using the difference between the synthesized scene and the actual scene as follows:  n+1 n+1 − Pi(x,y,z) | < τi 0, if |Pi(x,y,z) , (3) D(u,v) = n+1 n+1 (I(u,v) + I(u,v) )/2, otherwise where, τi is a threshold directly related with the depth to the camera (since the accuracy of the stereo rig decreases with the depth, the value of τ increases to compensate that loss of accuracy). Image differences are used in the above map just to see the correlation between intensity differences and 3D coordinate differences of mapped points (i.e., a given point in frame (n) with its corresponding one in frame (n + 1)). Figure 4(left) presents the map of moving regions, D(u,v) , resulting from the frame (n + 1) (Fig. 1(right)) and the synthesized view corresponding to frame

Frame (n)

Frame (n + 1)

Fig. 5 Feature points detected and tracked through consecutive frames

Fig. 6 (le f t) Synthesized view of frame (n) (Fig. 5(le f t)). (right) Difference between consecutive frames: (|I(n) − I(n+1) |) to illustrate their relative displacement (pay special attention at the traffic lights and stop signposts)

34

A.D. Sappa et al.

(n) (see Figure 3). Additionally, Fig. 4(right) illustrates the raw image difference between the two consecutive frames (|I(n) − I(n+1)|).

4 Experimental Results Experimental results in real environments and different vehicle speeds are presented in this section. In all the cases large error regions correspond to both moving objects and misregistered areas. Several video sequences were processed on a 3.2 GHz Pentium IV PC. Experimental results presented in this chapter correspond to video sequences recorded at 10 fps. In other words the elapsed time between two consecutive frames is about 100 ms. The proposed algorithm takes, on average, 31 ms for registering consecutive frames by using about 300 feature points. Fig. 1(top) shows two frames of a crowded urban scene. This scene is particularly interesting since a large set of feature points over surfaces moving at different speed have been extracted. In this case, the use of classical ICP based approaches (e.g., [18]) would provide a wrong scene registration since points from static and moving objects are considered together. The synthesized view obtained from frame (n) is presented in Fig. 3(le f t). The quality of the registration result can be appreciated in the map of moving regions presented in Fig. 4(left). Particularly interesting is the lamp post region, where there is a perfect registration between the 3D coordinates of these pixels. Large errors at the top of trees or further away regions are mainly due to depth uncertainty, which as mentioned before grows quadratically with depth [19]. Wrong moving regions mainly correspond to hidden areas in frame (n) that are unveiled in frame (n + 1). Figure 4(right) presents the difference between consecutive frames (|I(n) − I(n+1) |)

Horizon Line

Fig. 7 Map of moving regions (D(u,v) ) obtained from the synthesized view (In+1 ) (Fig. 6(le f t)) and the corresponding frame (In+1 ) (Fig. 5(right))—bounding boxes are only illustrative and have been placed using the information of horizon line position as in [9]

Moving Object Detection from Mobile Platforms Using Stereo Data Registration

35

to highlight that although these frames (Fig. 1(top)) look quite similar there is a considerable relative displacement between them. A different scenario is shown in the two consecutive frames presented in Fig. 5. In that scene, the car is reducing the speed to stop for a red light, three pedestrian are crossing the street. Although the vehicle is reducing the speed there is a relative displacement between these consecutive frames (see Fig. 6(right)). The synthesized view of frame (n), using the computed 3D rigid displacement, is presented in Fig. 6(le f t). Finally, the corresponding moving regions map is depicted in Fig. 7. Bounding boxes enclosing moving objects can provide a reliable information to select candidate windows to be used by a classification process (e.g., a pedestrian classifier). In this case, the number of windows would greatly decrease compared to other approaches in the literature, such as 108 windows in an exhaustive scan [20] or 2,000 windows in a road uniform sampling [9].

5 Conclusions This chapter presents a novel and robust approach for moving object detection by registering consecutive clouds of 3D points obtained by an on-board stereo camera. The registration process is only applied over two small sets of 3D points with known correspondences by using key point features extraction and a RANSAC-like technique based on the closed-form solution provided by the unit quaternion method. Then, a synthesized 3D scene is obtained after mapping the whole set of points from the previous frame to the current one. Finally, a map of moving regions is generated by considering the difference between current 3D scene and the synthesized one. As future work more evolved approaches for combining registered frames will be studied. For instance, instead of only using consecutive frames, temporal windows including more frames are likely to help filtering out noisy areas. Furthermore, color information of each pixel could be used during the estimation of the moving region map. Acknowledgment. This work was supported in part by the Spanish Ministry of Science and Innovation under Projects TRA2010-21371-C03-01, TIN2010-18856 and Research Program Consolider Ingenio 2010: MIPRCV (CSD2007-00018).

References 1. Amir, S., Barhoumi, W., Zagrouba, E.: A robust framework for joint background/foreground segmentation of complex video scenes filmed with freely moving camera. Pattern Analysis and Applications 46(2), 175–205 2. Benjemaa, R., Schmitt, F.: A solution for the registration of multiple 3D point sets using unit quaternions. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 34–50. Springer, Heidelberg (1998) 3. Besl, P., McKay, N.: A method for registration of 3D shapes. IEEE Trans. on Pattern Analysis and Machine Intelligence 14(2), 239–256 (1988)

36

A.D. Sappa et al.

4. Chen, Y., Medioni, G.: Object modelling by registration of multiple range images. Image Vision Comput. 10(3), 145–155 (1992) 5. Chetverikov, D., Stepanov, D., Krsek, P.: Robust Euclidean alignment of 3D point sets: the trimmed iterative closest point algorithm. Image and Vision Computing 23(1), 299– 309 (2005) 6. Chui, H., Rangarajan, A.: A feature registration framework using mixture models. In: MMBIA 2000: Proceedings of the IEEE Workshop on Mathematical Methods in Biomedical Image Analysis, pp. 190–197 (2000) 7. Dahyot, R.: Unsupervised camera motion estimation and moving object detection in videos. In: Proc. of the Irish Machine Vision and Image Processing, Dublin, Ireland (August 2006) 8. Fitzgibbon, A.: Robust registration of 2D and 3D point sets. Image and Vision Computing 21(13), 1145–1153 (2003) 9. Ger´onimo, D., Sappa, A.D., L´opez, A., Ponsa, D.: Adaptive image sampling and windows classification for on–board pedestrian detection. In: Proc. Int. Conf. on Computer Vision Systems, Bielefeld, Germany (2007) 10. Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. of The Fourth Alvey Vision Conference, Manchester, UK, pp. 147–151 (1988) 11. Hashiyama, T., Mochizuki, D., Yano, Y., Okuma, S.: Active frame subtraction for pedestrian detection from images of moving camera. In: Proc. IEEE Int. Conf. on Systems, Man and Cybernetics, Washington, USA, pp. 480–485 (October 2003) 12. Horn, B.: Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America A 4, 629–642 (1987) 13. Jian, B., Vemuri, B.: A robust algorithm for point set registration using mixture of Gaussians. In: 10th IEEE International Conference on Computer Vision, Beijing, China, October 17-20, pp. 1246–1251 (2005) 14. Kastrinaki, V., Zervakis, M., Kalaitzakis, K.: A survey of video processing techniques for traffic applications. Image and Vision Computing 21(4), 359–381 (2003) 15. Kong, H., Audibert, J., Ponce, J.: Detecting abandoned objects with a moving camera. IEEE Transactions on Image Processing 19(8), 2201–2210 (2010) 16. Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 2(60), 91–110 (2004) 17. Ma, Y., Soatto, S., Koseck´a, J., Sastry, S.: An Invitation to 3D Vision: From Images to Geometric Models. Springer, New York (2004) 18. Milella, A., Siegwart, R.: Stereo-based ego-motion estimation using pixel tracking and iterative closest point. In: Proc. IEEE Int. Conf. on Mechatronics and Automation, USA (January 2006) 19. Oniga, F., Nedevschi, S., Meinecke, M., To, T.: Road surface and obstacle detection based on elevation maps from dense stereo. In: Proc. IEEE Int. Conf. on Intelligent Transportation Systems, Seattle, USA, pp. 859–865 (September 2007) 20. Oren, M., Papageorgiou, C., Sinha, P., Osuna, E., Poggio, T.: Pedestrian detection using wavelet templates. In: Proc. IEEE Int. Conf. on Computer Vision and Pattern Recognition, Puerto Rico (June 1997) 21. Radke, R., Andra, S., Al-Kofahi, O., Roysam, B.: Image change detection algorithms: A systematic survey. IEEE Trans. on Image Processing 14(3), 294–307 (2003) 22. Restrepo-Specht, A., Sappa, A.D., Devy, M.: Edge registration versus triangular mesh registration, a comparative study. Signal Processing: Image Communication 20(9-10), 853–868 (2005) 23. Salvi, J., Matabosch, C., Fofi, D., Forest, J.: A review of recent range image registration methods with accuracy evaluation. Image Vision Computing 25(5), 578–596 (2007)

Moving Object Detection from Mobile Platforms Using Stereo Data Registration

37

24. Sappa, A.D., Dornaika, F., Ger´onimo, D., L´opez, A.: Registration-based moving object detection from a moving camera. In: Proc. on Workshop on Perception, Planning and Navigation for Intelligent Vehicles, Nice, France (September 2008) 25. Sharp, G., Lee, S., Wehe, D.: ICP registration using invariant features. IEEE Trans. Pattern Anal. Mach. Intell. 24(1), 90–102 (2002) 26. Shimizu, S., Yamamoto, K., Wang, C., Satoh, Y., Tanahashi, H., Niwa, Y.: Moving object detection by mobile stereo omni-directional system (SOS) using spherical depth image. Pattern Analysis and Applications (2), 113–126 27. Taleghani, S., Aslani, S., Saeed, S.: Robust moving object detection from a moving video camera using neural network and kalman filter. In: Iocchi, L., Matsubara, H., Weitzenfeld, A., Zhou, C. (eds.) RoboCup 2008. LNCS, vol. 5399, pp. 638–648. Springer, Heidelberg (2009) 28. Tarel, J.-P., Civi, H., Cooper, D.B.: Pose estimation of free-form 3D objects without point matching using algebraic surface models. In: Proceedings of IEEE Worshop Model Based 3D Image Analysis, Mumbai, India, pp. 13–21 (1998) 29. Vedaldi, A.: An open implementation of the SIFT detector and descriptor. Technical Report 070012, UCLA CSD (2007) 30. Wang, C., Thorpe, C., Thrun, S.: Online simultaneous localization and mapping with detection and tracking of moving objects: theory and results from a ground vehicle in crowded urban areas. In: Proc. IEEE Int. Conf. on Robotics and Automation, Taipei, Taiwan, pp. 842–849 (September 2003) 31. Wang, H., Zhang, Q., Luo, B., Wei, S.: Robust mixture modelling using multivariate t-distribution with missing information. Pattern Recogn. Lett. 25(6), 701–710 (2004) 32. Wange, L., Yung, N.: Extraction of moving objects from their background based on multiple adaptive thresholds and boundary evaluation. IEEE Transactions on Intelligent Transportation Systems 11(1), 40–51 (2010) 33. Wolf, D., Sukhatme, G.: Mobile robot simultaneous localization and mapping in dynamic environments. Autonomous Robots 19(1), 53–65 (2005) 34. Yu, Q., Medioni, G.: Map-enhanced detection and tracking from a moving platform with local and global data association. In: Proc. IEEE Workshops on Motion and Video Computing, Austin, Texas (February 2007) 35. Yu, Q., Medioni, G.: A GPU-based implementation of motion detection from a moving platform. In: Proc. IEEE Workshops on Computer Vision and Pattern Recognition, Anchorage, Alaska (June 2008) 36. Zheng, B., Ishikawa, R., Oishi, T., Takamatsu, J., Ikeuchi, K.: A fast registration method using IP and its application to ultrasound image registration. IPSJ Transactions on Computer Vision and Applications 1, 209–219 (2009) 37. Zheng, B., Takamatsu, J., Ikeuchi, K.: An adaptive and stable method for fitting implicit polynomial curves and surfaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(3), 561–568 (2010)

Chapter 4

Pattern Classifications in Cognitive Informatics Lidia Ogiela AGH University of Science and Technology Al. Mickiewicza 30, PL-30-059 Krakow, Poland [email protected]

Abstract. This chapter presents problems to which cognitive data analysis algorithms are applied. The cognitive data analysis approach presented here will be used to build cognitive systems for classifying patterns, which in turn will form the basis for discussing and characterising Understanding Based Image Analysis Systems (UBIAS). The purpose of these systems is image analysis, while their design and operation follow cognitive/reasoning processes characteristic for human cognition. Such processes represent not just simple data analysis, but their main function is to interpret, understand and reason about analysed data sets, employing the automatic data understanding process carried out by the system. A characteristic feature of cognitive analysis processes is the use of linguistic algorithms – which are to extract the meaning from data sets – to describe the analysed sets. It is those algorithms that serve to describe the data, interpret it and to reason. Keywords: Cognitive informatics, pattern classifications, pattern understanding, cognitive analysis, computational intelligence, natural intelligence, intelligent systems.

1 Introduction Data understanding by interpreting the meaning of analysed data is based on a semantic analysis of patterns defined in the system. Such analysis and reasoning processes are conducted by cognitive systems, understood as systems that carry out semantic analysis processes. Semantic analysis processes make use of cognitive resonance, described in the author's publications [17]-[42], [44], [45], [48]-[51], which forms the main stage in the correct operation of the system. The development of cognitive information systems was sparked by moves to combine systems which recognised data (most often data in the form of images) with automatic data understanding processes (an implementation of semantic analysis solutions understood as the cognitive analysis). These processes have been described in publications [17]-[45], [48]-[57] and are still successfully developed. M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 39–57. springerlink.com © Springer-Verlag Berlin Heidelberg 2012

40

L.Ogiela

So far, the authors of the above analysed image data from medical imaging, and more specifically, analysed lesions occurring within the central nervous system, i.e. the spinal cord, in the bones of feet, palms and wrists. The second, independent type of data analysed consisted in economic and financial figures of companies, where the aim was to assess whether the financial, strategic and marketing decisions suggested by the system were reasonable. Here, the author will concentrate on cognitive systems analysing the meaning of image data, with particular attention to UBIAS (Understanding Based Image Analysis Systems) analysing lesions within foot bones. The novelty presented here is that the UBIAS system will analyse lesions within the bones of the whole foot, as previously efforts were only made to analyse foot bones excluding phalanx bones. Including the phalanx bones in the analysis of foot bone lesions makes the process of understanding the analysed medical (health) situation much richer, so the process can identify the right lesion more unambiguously, consequently yielding a deeper cognitive analysis and the semantic analysis as such of the lesion observed.

2 Semantic Analysis Stages Semantic analysis is the key to the correct operation of cognitive data analysis systems. When this analysis is conducted, several different (but equally important for the analysis) processes occur: interpretation, description, analysis and reasoning. The main stages of semantic analysis are as follows [26], [43], [46]: • data pre-processing:  filtration and amplification;  approximation;  coding; • data presentation:  segmentation;  recognising picture primitives;  identifying relations between picture primitives; • linguistic perception; • syntactic analysis; • pattern classification; • data classification; • feedback; • cognitive resonance; • data understanding. The main stages of semantic analysis are shown in Figure 1. A clear majority of the above semantic analysis stages deals with the data understanding process, as from the beginning of the syntactic analysis conducted using the formal grammar

Pattern Classifications in Cognitive Informatics

41

defined in the system, there are stages aimed at identifying the analysed data with particular attention to its semantics (the meaning it contains). The stages of recognition itself become the starting point for further stages, referred to as the cognitive analysis. This is why the understanding process as such requires the application of feedback during which the features of the analysed data are compared to expectations which the system had generated from its expert knowledge base. This feedback is called cognitive resonance. It identifies those feedbacks which turn out to be material for the analysis conducted, i.e. in which features are consistent with expectations. The next element necessary is the data understanding as such, during which the significance of the analysed changes for their further growth or atrophy (as in lesions) is determined.

Fig. 1 Processes of semantic data analysis

The data is analysed by identifying the characteristic features of the given data set, which then determine the decision. This decision is the result of the completed data analysis (Fig. 2).

42

L.Ogiela

Fig. 2 Process of data analysis

The data analysis process is supplemented with the cognitive analysis, which consists in selecting consistent pairs and non-consistent pairs of elements from the generated set of features characteristic for the analysed set and from the set of expectations as to the analysed data generated using the expert knowledge base kept in the system. The comparison leads to cognitive resonance, which identifies consistent pairs and non-consistent pairs, where the latter not material in the further analysis process. In cognitive analysis, the consistent pairs are used to understand the meaning (semantics) of the analysed data sets (Fig. 3).

Fig. 3 Data understanding in cognitive data analysis

Pattern Classifications in Cognitive Informatics

43

Because of the method of conducting the semantic analysis and the linguistic perception algorithms – grammar formalisms – used in its course, semantic analysis has become the core of the operation of cognitive data analysis, interpretation and reasoning systems.

3 Semantic Analysis vs. Cognitive Informatics Semantic analysis processes which form the cornerstone of cognitive information systems also underpin a new branch of science, which is now undergoing very fast development: cognitive informatics. The notion of cognitive informatics has been proposed in publications [59], [60] and has become the starting point for a formal approach to interdisciplinary considerations of running semantic analyses in cognitive areas. Cognitive informatics is understood as the combination of cognitive science and information science with the goal of researching mechanisms by which information processes run in human minds. These processes are treated as elements of natural intelligence, and they are mainly applied to engineering and technical problems in an interdisciplinary approach. Semantic analysis in the sense of cognitive analysis plays a significant role, as it identifies the meaning in areas analysed. The meaning as such is identified using the formal grammar defined in the system and its related set of productions, within which productions are defined, which elements the system utilises to analyse the meaning. Features such as those below are analysed: • • • • • • •

lesion occurrence, its size, length, width, lesion vastness, number of lesions observed, repetition frequency, lesion structure, lesion location.

These features can be identified correctly using the set of productions of the linguistic reasoning algorithm. For this reason, the definition of linguistic algorithms of perception and reasoning forms the key stage in building a cognitive system. In line with the discussed cognitive approach, the entire process of linguistic data perception and understanding hinges on a grammatical analysis aimed at answering the question whether the data set is semantically correct from the perspective of the grammar defined in the system, or is not. If there are consistencies, the system runs an analysis to identify these consistencies and assign them the correct names. If the is no consistency, the system will not execute further analysis stages as the lack of consistency may be due to various reasons.

44

L.Ogiela

The most frequent ones include: • • • • •

the wrong definition of the formal grammar, no definition of the appropriate semantic reference, an incompletely defined pattern, a wrongly defined pattern, a representative from outside the recognisable data class accepted for analysing.

All these reasons may cause a failure at the stage of determining the semantic consistency of the analysed specimen and the formal language adopted for the analysis. In this case, the whole definition process should be reconsidered, as the error could have occurred at any stage of it. Cognitive systems carry out the correct semantic analysis by applying a linguistic approach developed by analogy to cognitive/decision-making processes taking place in the human brain. These processes are aimed at the in-depth analysis of various data sets. Their correct course implies that the human cognitive system is successful. The system thus becomes the foundation for designing cognitive data analysis systems. Cognitive systems designed for semantic data analysis used in the cognitive informatics field execute data analysis in three “stages”. This split is presented in Figure 4.

Fig. 4 The three-stage operation of cognitive systems in the cognitive informatics field

Pattern Classifications in Cognitive Informatics

45

The above diagram shows three separate data analysis stages. The first of them is the traditional data analysis, which includes qualitative and quantitative analyses. The results of this analysis are supplemented with the linguistic presentation of the analysed data set, which forms the basis for extracting the semantic features from these sets. Extracting the meaning of the sets from them is the starting point for the second stage of the analysis, referred to as the semantic analysis. The end of this stage at the same time forms the beginning of the next analysis process, referred to as the cognitive data analysis. During this stage, the results obtained are interpreted using the semantic data notations generated previously. The interpretation of results consists in not just their simple description or in recognising the situation being analysed, but it is in particular the stage at which data, the situation and information is understood, the stage of reasoning based on the results obtained and forecasting the changes that may appear in the future.

4 Example of a Cognitive UBIAS System A cognitive data analysis system analysing lesions occurring within foot bones shown in X-ray images is an example of a cognitive system applied to the semantic analysis of image data. The UBIAS system carries out the analysis by using mathematical linguistic algorithms based on graph formalisms proposed in publications [24]-[27], [45]. The key aspect in introducing the right definition of the formal grammar is to adopt names of bones found within the foot, which include: • • • • • • • • • • •

talus (t), calcaneus (c), os cuboideum (cu), os naviculare (n), os cuneiforme laterale (cl), os cuneiforme mediale (cm), os cuneiforme intermedium (ci), os sesamoidea (ses), os metatarsale (tm), os digitorum (dip), phalanx (pip).

The healthy structure of foot bones is shown in Figure 5. Figure 5 presents all the foot bones, divided into metatarsus, tarsus and phalanx bones, which will be analysed by the proposed UBIAS system. To provide the right insight into the proposed way of analysing foot bone lesions in X-ray images, an X-ray of a foot free of any lesions within it, which represents the pattern defined in the UBIAS system, is shown below (Fig. 6).

46

Fig. 5 Healthy foot structure

Fig. 6 Healthy foot structure – X-ray images

L.Ogiela

Pattern Classifications in Cognitive Informatics

47

The example UBIAS system discussed in this chapter is used for the image analysis of foot bone lesions in the dorsopalmar projection of the foot, including tarsus, metatarsus and phalanx bones. The grammatical graph formalism for the semantic analysis has been defined as the Gfoot grammar taking the following form:

G foot = ( N f , T f , Γ f , S , P ) where: The set of non-terminal labels of apexes: Nf = {ST, TALUS, CUBOIDEUM, NAVICULARE, LATERALE, MEDIALE, INTERMEDIUM, SES1, SES2, TM1, TM2, TM3, TM4, TM5, MP1, MP2, MP3, MP4, MP5, PIP1, PIP2, PIP3, PIP4, PIP5, DIP2, DIP3, DIP4, DIP5, TPH1, TPH2, TPH3, TPH4, TPH5, ADD1, ADD2, ADD3, ADD4, ADD5, ADD6, ADD7, ADD8, ADD9, ADD10, ADD11, ADD12, ADD13, ADD14} The set of terminal labels of apexes: Tf = {c, t, cu, n, cl, cm, ci, s1, s2, tm1, tm2, tm3, tm4, tm5, mp1, mp2, mp3, mp4, mp5, pip1, pip2, pip3, pip4, pip5, dip2, dip3, dip4, dip5, tph1, tph2, tph3, tph4, tph5, add1, add2, add3, add4, add5, add6, add7, add8, add9, add10, add11, add12, add13, add14}. Γf – {p, q, r, s, t, u, v, w, x, y, z} – the graph shown in Fig. 7.

Fig. 7 Definitions of elements of the set Гf

48

L.Ogiela

S – The start symbol P – set of productions (Fig.8).

Fig. 8 Set of productions P

Pattern Classifications in Cognitive Informatics

49

Figure 9 shows the graph of relations between individual tarsus, metatarsus and phalanx bones.

Fig. 9 A relation graph of foot bones in the dorsopalmar projection

Figure 10 shows the graph of relations between individual tarsus, metatarsus and phalanx bones including the angles of slopes between individual foot bones.

Fig. 10 A relation graph in the dorsopalmar projection

This definition method allows the UBIAS system to start analysing image data. Selected results of its operation are illustrated by Figures 11-16, which, to comprehensively present the universality of the analysis, show selected examples of automatic image data interpretation and their semantic interpretation. For a comparison, the authors have chosen the following medical images used for cognitive data interpretation with the application of graph formalisms to analyse images showing various foot bone lesions. Figure 11 shows an example of the automatic analysis of an image depicting a fracture of os naviculare.

50

L.Ogiela

Fig. 11 Image data analysis by UBIAS systems to understand data showing foot bone deformations – a fracture of os naviculare

Figure 12 illustrates the method of UBIAS system operation when this system analyses image data to automatically understand data showing a foot deformation.

Fig. 12 Image data analysis by UBIAS systems to understand data showing foot bone deformations

Figure 13 is an example of an attempt to automatically analyse image data illustrating a fracture of the neck of the talus.

Fig. 13 Image data analysis by UBIAS systems to understand data a fracture of the neck of the talus

Pattern Classifications in Cognitive Informatics

51

Figure 14 is an image data analysis carried out by an UBIAS system on an example showing a foot deformation caused by diabetes.

Fig. 14 Image data analysis by UBIAS systems to understand data showing a foot deformation caused by diabetes

Figure 15 is an image data analysis carried out by an UBIAS system on an example showing a osteoarthritis of foot.

Fig. 15 Image data analysis by UBIAS systems to understand data showing a osteoarthritis

Figure 16 is an image data analysis carried out by an UBIAS system on an example showing a osteomyelitis of foot.

Fig. 16 Image data analysis by UBIAS systems to understand data showing a osteomyelitis

52

L.Ogiela

All the above examples of automatic image data analysis demonstrate the essence of UBIAS cognitive system operation, namely the correct understanding of the analysed lesion using series of productions defined in the system and the semantics of the analysed images.

5 Conclusions Examples of the automatic understanding of image data presented in this chapter demonstrate the extent to which semantic analysis can be used for cognitive data analysis problems in cognitive informatics. This type of reasoning systems play quite a notable role and are quite significant as they use robust formalisms of linguistic description and analysis of data. These formalisms, based on the formal grammar presented in this chapter, meet the requirements of an in-depth analysis and a cognitive interpretation of analysed data sets. Due to the semantic analysis carried out by UBIAS systems, cognitive systems are becoming increasingly important in data analysis processes. Apart from UBIAS systems, other systems of cognitive data analysis are also being developed, which the Reader can find in publications including [26], [42]. The approach to the subject of cognitive data analysis systems presented in this chapter is discussed by reference to methods of semantically analysing image-type data. The essence of this approach is to apply cognitive/interpretation/reasoning processes in the operation of systems. Systems can be built based on cognitive and decision-making processes only if the system will analyse and interpret data as well as conduct the reasoning and projecting stages using the semantic characteristics of the analysed data. The semantics of the analysed sets makes in-depth analysis processes possible and at the same time becomes the starting point for projecting changes that may occur in the future, thus allowing errors that could occur in the future to be eliminated. Traditional data analysis systems frequently cannot identify such errors to be eliminated. So the characteristic feature which also distinguishes cognitive systems from others is the process of reasoning on the basis of analysed data and the process of projecting based on the data analysis conducted. Acknowledgement. This work has been supported by the National Science Center, Republic of Poland, under project number N N516 478940.

References 1. Albus, J.S., Meystel, A.M.: Engineering of Mind – An Introduction to the Science of Intelligent Systems. A Wiley-Interscience Publication John Wiley & Sons Inc. (2001) 2. Berners-Lee, T.: Weaving the Web. Texere Publishing (2001)

Pattern Classifications in Cognitive Informatics

53

3. Berners-Lee, T., Fensel, D., Hendler, J.A., Lieberman, H., Wahlster, W. (eds.): Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential. MIT Press (2005) 4. Branquinho, J. (ed.): The Foundations of Cognitive Science. Clarendon Press, Oxford (2001) 5. Brejl, M., Sonka, M.: Medical image segmentation: Automated design of border detection criteria from examples. Journal of Electronic Imaging 8(1), 54–64 (1999) 6. Burgener, F.A., Kormano, M.: Bone and Joint Disorders. Thieme, Stuttgart (1997) 7. Chomsky, N.: Language and Problems of Knowledge: The Managua Lectures. MIT Press, Cambridge (1988) 8. Cohen, H., Lefebvre, C. (eds.): Handbook of Categorization in Cognitive Science. Elsevier, The Netherlands (2005) 9. Davis, L.S. (ed.): Foundations of Image Understanding. Kluwer Academic Publishers (2001) 10. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. 2nd edn. A WileyInterscience Publication John Wiley & Sons, Inc. (2001) 11. Jurek, J.: On the Linear Computational Complexity of the Parser for Quasi Context Sensitive Languages. Pattern Recognition Letters 21,179–187 (2000) 12. Jurek, J.: Recent developments of the syntactic pattern recognition model based on quasi-context sensitive language. Pattern Recognition Letters 2(26), 1011–1018 (2005) 13. Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.): Intelligent Information Processing and Web Mining. Proceedings of the International IIS: IIP WM 2004 Conference, Zakopane, May 17-20. Springer, Poland (2004) 14. Köpf-Maier, P.: Wolf_Heidegger’s Atlas of Human Anatomy, Part 1. Systemic Anatomy, Warszawa (2002) 15. Lassila, O., Hendler, J.: Embracing web 3.0. IEEE Internet Computing, 90–93 (MayJune 2007) 16. Meystel, A.M., Albus, J.S.: Intelligent Systems – Architecture, Design, and Control. A Wiley-Interscience Publication John Wiley & Sons, Inc., Canada (2002) 17. Ogiela, L.: Usefulness assessment of cognitive analysis methods in selected IT systems, Ph. D. Thesis, AGH Kraków (2005) 18. Ogiela, L.: Cognitive Systems for Medical Pattern Understanding and Diagnosis. In: Lovrek, I., Howlett, R.J., Jain, L.C. (eds.) KES 2008, Part I. LNCS (LNAI), vol. 5177, pp. 394–400. Springer, Heidelberg (2008) 19. Ogiela, L.: Modelling of Cognitive Processes for Computer Image Interpretation. In: Al-Dabass, D., Nagar, A., Tawfik, H., Abraham, A., Zobel, R. (eds.) EMS 2008 European Modelling Symposium, Second UKSIM European Symposium on Computer Modeling and Simulation, Liverpool, United Kingdom, September 8-10, pp. 209–213 (2008) 20. Ogiela, L.: Syntactic Approach to Cognitive Interpretation of Medical Patterns. In: Xiong, C.-H., Liu, H., Huang, Y., Xiong, Y.L. (eds.) ICIRA 2008. LNCS (LNAI), vol. 5314, pp. 456–462. Springer, Heidelberg (2008) 21. Ogiela, L.: Cognitive Computational Intelligence in Medical Pattern Semantic Understanding. In: Guo, M., Zhao, L., Wang, L. (eds.) Fourth International Conference on Natural Computation, ICNC 2008, Jinan, Shandong, China, October 18-20, vol. 6, pp. 245–247 (2008) 22. Ogiela, L.: Innovation Approach to Cognitive Medical Image Interpretation. In: 5th International Conference on Innovations in Information Technology, Innovation 2008, Al Ain, United Arab Emirates, December 16-18, pp. 722–726 (2008)

54

L.Ogiela

23. Ogiela, L.: UBIAS Systems for Cognitive Interpretation and Analysis of Medical Images. Opto-Electronics Review 17(2), 166–179 (2008) 24. Ogiela, L.: Computational intelligence in cognitive healthcare information systems. In: Bichindaritz, I., Vaidya, S., Jain, A., Jain, L.C. (eds.) Computational Intelligence in Healthcare 4. SCI, vol. 309, pp. 347–369. Springer, Heidelberg (2010) 25. Ogiela, L.: Cognitive Informatics in Automatic Pattern Understanding and Cognitive Information Systems. In: Wang, Y., Zhang, D., Kinsner, W. (eds.) Advances in Cognitive Informatics and Cognitive Computing. SCI, vol. 323, pp. 209–226. Springer, Heidelberg (2010) 26. Ogiela, L., Ogiela, M.R.: Cognitive Techniques in Visual Data Interpretation. SCI. Springer, Heidelberg (2009) 27. Ogiela, L., Ogiela, M.R.: ognitive Approach to Bio-Inspired Medical Image Understanding. In: Nagar, A.K., Thamburaj, R., Li, K., Tang, Z., Li, R. (eds.) Proceedings 2010 IEEE Fifth International Conference Bio-Inspired Computing: Theories and Applications, Liverpool, UK, September 8-10, pp. 1010–1013 (2010) 28. Ogiela, L., Ogiela, M.R., Tadeusiewicz, R.: Mathematical Linguistic in Cognitive Medical Image Interpretation Systems. Journal of Mathematical Imaging and Vision 34, 328–340 (2009) 29. Ogiela, L., Ogiela, M.R., Tadeusiewicz, R.: Cognitive reasoning in UBIAS systems supporting interpretation of medical. In: 2nd International Conference on Computer Science and its Applications CSA 2009, Jeju, Korea, December 10-12, vol. 2, pp. 448–451 (2009) 30. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Informatics in Automatic Pattern Understanding. In: Hang, D., Wang, Y., Kinsner, W. (eds.) Proceedings of the Sixth IEEE International Conference on Cognitive Informatics, ICCI 2007, Lake Tahoe, CA, USA, August 6-8, pp. 79–84 (2007) 31. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Computing in Analysis of 2D/3D Medical Images. In: The 2007 International Conference on Intelligent Pervasive Computing – IPC 2007, Jeju Island, Korea, October 11-13, pp. 15–18. IEEE Computer Society (2007) 32. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Linguistic Categorization for Medical Multi-dimensional Pattern Understanding. In: ACCV 2007 Workshop on Multi-dimensional and Multi-view Image Processing, Tokyo, Japan, November 18-22, pp. 150–156 (2007) 33. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Techniques in Medical Information Systems, Computers in Biology and Medicine 38, 502–507 (2008) 34. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Approach to Medical Image Semantics Description and Interpretation. In: INFOS 2008, The 6th International Conference on Informatics and Systems, Egypt, March 27-29, vol. 5, pp. HBI-1–HBI-5 (2008) 35. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Modeling in Medical Pattern Semantic Understanding. In: The 2nd International Conference on Multimedia and Ubiquitous Engineering MUE 2008, Busan, Korea, April 24-26, pp. 15–18 (2008) 36. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Al-Cognitive Description in Visual Pattern Mining and Retrieval. In: Second Asia Modeling & Simulation AMS, Kuala Lumpur, Malaysia, May 13-15, pp. 885–889 (2008)

Pattern Classifications in Cognitive Informatics

55

37. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Modeling in Computational Intelligence Methods for Medical Pattern Semantic Categorization and Understanding. In: Proceedings of the Fourth IASTED International Conference Advances in Computer Science and Technology (ACST 2008), Malaysia, Langkawi, April 2-4, pp. 368-371 (2008) 38. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Categorization in Medical Structures Modeling and Image Understanding. In: Li, D., Deng, G. (eds.) 2008 International Congress on Image and Signal Processing, CISP 2008, Sanya, Hainan, China, May 27-30, vol. 4, pp. 560–564 (2008) 39. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Approach to Medical Pattern Recognition, Structure Modeling and Image Understanding. In: Peng, Y., Zhang, Y. (eds.) First International Conference on BioMedical Engineering and Informatics, BMEI 2008, Sanya, Hainan, China, May 27-30, vol. 30, pp. 33–37 (2008) 40. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Methods in Medical Image Analysis and Interpretation. In: The 4th International Workshop on Medical Image and Augmented Reality, MIAR 2008, The University of Tokyo, Japan (2008) 41. Ogiela, L., Tadeusiewicz, R., Ogiela, M.R.: Cognitive Categorizing in UBIAS Intelligent Medical Information Systems. In: Sordo, M., Vaidya, S., Jain, L.C. (eds.) Advanced Computational Intelligence Paradigms in Healthcare 3. SCI, vol. 107, pp. 75–94. Springer, Heidelberg (2008) 42. Ogiela, M.R., Ogiela, L., Tadeusiewicz, R.: Cognitive Reasoning UBIAS & E-UBIAS Systems in Medical Informatics. In: INC 2010 6th International Conference on Networked Computing, Gyeonju, Korea, May 11-13, pp. 360–364 (2010) 43. Ogiela, M.R., Tadeusiewicz, R.: Modern Computational Intelligence Methods for the Interpretation of Medical Images. Springer, Heidelberg (2008) 44. Ogiela, M.R., Tadeusiewicz, R., Ogiela, L.: Image languages in intelligent radiological palm diagnostics. In: Pattern Recognition 39, 2157–2165 (2165) 45. Ogiela, M.R., Tadeusiewicz, R., Ogiela, L.: Graph image language techniques supporting radiological, hand image interpretations. In: Computer Vision and Image Understanding, vol. 103, pp. 112–120. Elsevier Inc. (2006) 46. Rutkowski, L.: New Soft Computing Technigues for System Modelling, Pattern Classification and Image Processing. Studies in Fuzziness and Soft Computing. Springer, Heidelberg (2004) 47. Rutkowski, L.: Computational Intelligence, Methods and Techniques. Springer, Heidelberg (2008) 48. Skomorowski, M.: Use of random graph parsing for scene labeling by probabilistic relaxation. Pattern Recognition Letters 20(9), 949–956 (1999) 49. Skomorowski, M.: Syntactic recognition of distorted patterns by means of random graph parsing. Pattern Recognition Letters 28(5), 572–581 (2007) 50. Tadeusiewicz, R., Ogiela, L.: Selected Cognitive Categorization Systems. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2008. LNCS (LNAI), vol. 5097, pp. 1127–1136. Springer, Heidelberg (2008) 51. Tadeusiewicz, R., Ogiela, L.: Categorization in Cognitive Systems. In: Svetoslav, N. (ed.) Fifth International Conference of Applied Mathematics and Computing, FICAMC 2008, Plovdiv, Bulgaria, August 12-18, vol. 3, p. 451 (2008) 52. Tadeusiewicz, R., Ogiela, L., Ogiela, M.R.: Cognitive Analysis Techniques in Business Planning and Decision Support Systems. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 1027–1039. Springer, Heidelberg (2006)

56

L.Ogiela

53. Tadeusiewicz, R., Ogiela, L., Ogiela, M.R.: The automatic understanding approach to systems analysis and design. International Journal of Information Management 28, 38–48 (2008) 54. Tadeusiewicz, R., Ogiela, M.R.: Medical Image Understanding Technology, Artificial Intelligence and Soft-Computing for Image Understanding. Springer, Heildelberg (2004) 55. Tadeusiewicz, R., Ogiela, M.R.: New Proposition for Intelligent Systems Design: Artificial Understanding of the Images as the Next Step of Advanced Data Analysis after Automatic Classification and Pattern Recognition. In: Kwasnicka, H., Paprzycki, M. (eds.) Intelligent Systems Design and Applications, Proceedings 5th International Conference on Intelligent Systems Design and Application ISDA 2005, Wrocław, September 8-10, pp. 297–300. IEEE Computer Society Press, Los Alamitos (2005) 56. Tadeusiewicz, R., Ogiela, M.R.: Automatic Image Understanding – A New Paradigm for Intelligent Medical Image Analysis. Bio-Algorithms and Med-Systems 2(3), 5–11 (2006) 57. Tadeusiewicz, R., Ogiela, M.R.: Why Automatic Understanding? In: Beliczynski, B., Dzielinski, A., Iwanowski, M., Ribeiro, B. (eds.) ICANNGA 2007. LNCS, vol. 4432, pp. 477–491. Springer, Heidelberg (2007) 58. Tadeusiewicz, R., Ogiela, M.R., Ogiela, L.: A New Approach to the Computer Support of Strategic Decision Making in Enterprises by Means of a New Class of Understanding Based Management Support Systems. In: Saeed, K., Abraham, A., Mosdorf, R. (eds.) CISIM 2007 – IEEE 6th International Conference on Computer Information Systems and Industrial Management Applications Ełk, Poland, June 2830, pp. 9–13. IEEE Computer Society (2007) 59. Tadeusiewicz, R., Ogiela, M.R.: Automatic Understanding of Images. In: Adipranata, R. (ed.) Proceedings of International Conference on Soft Computing, Intelligent System and Information Technology (ICSIIT 2007), Bali-Indonesia, July 26-27, pp. 13–38. Special Book for Keynote Talks, Informatics Engineering Department, Petra Christian University, Surabaya (2007) 60. Tanaka, E.: Theoretical aspects of syntactic pattern recognition. Pattern Recognition 28, 1053–1061 (1995) 61. Wang, Y.: The Real-Time Process Algebra (RTPA). The International Journal of Annals of Software Engineering 14, 235–274 (2002) 62. Wang, Y.: On Cognitive Informatics. Brain and Mind: A Transdisciplinary Journal of Neuroscience and Neurophilosophy 4(2), 151–167 (2003) 63. Wang, Y.: The Theoretical Framework of Cognitive Informatics. International Journal of Cognitive Informatics and Natural Intelligence 1(1), 1–27 (2007) 64. Wang, Y.: The Cognitive Processes of Formal Inferences. International Journal of Cognitive Informatics and Natural Intelligence 1(4), 75–86 (2007) 65. Wang, Y.: On Concept Algebra: A Denotational Mathematical Structure for Knowledge and Software Modeling. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 1–19 (2008) 66. Wang, Y.: On System Algebra: A Denotational Mathematical Structure for Abstract System modeling. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 20–42 (2008) 67. Wang, Y.: Deductive Semantics of RTPA. International Journal of Cognitive Informatics and Natural Intelligence 2(2), 95–121 (2008)

Pattern Classifications in Cognitive Informatics

57

68. Wang, Y.: On Visual Semantic Algebra (VSA) and the Cognitive Process of Pattern Recognition. In: Proc. 7th International Conference on Cognitive Informatics (ICCI 2008). IEEE CS Press, Stanford University, CA (2008) 69. Wang, Y., Johnston, R., Smith, M. (eds.): Cognitive Informatics: Proceedings 1st IEEE International Conference (ICCI 2002). IEEE CS Press, Canada (2002) 70. Wang, Y., Kinsner, W.: Recent Advances in Cognitive Informatics. IEEE Transactions on Systems, Man, and Cybernetics (Part C) 36(2), 121–123 (2006) 71. Wang, Y., Wang, Y., Patel, S., Patel, D.: A Layered Reference Model of the Brain (LRMB). IEEE Transactions on Systems, Man, and Cybernetics (Part C) 36(2), 124–133 (2006) 72. Wang, Y., Zhang, D., Latombe, J.C., Kinsner, W. (eds.): Proceedings 7th IEEE International Conference on Cognitive Informatics (ICCI 2008). IEEE CS Press, Stanford University, USA (2008) 73. Wilson, R.A., Keil, F.C.: The MIT Encyclopedia of the Cognitive Sciences. MIT Press (2001) 74. Yao, Y., Shi, Z., Wang, Y., Kinsner, W. (eds.): Proc. 5th IEEE International Conference on Cognitive Informatics (ICCI 2006). IEEE CS Press, China (2006) 75. Zadeh, L.A.: Fuzzy Sets and Systems. In: Fox, J. (ed.) Systems Theory, pp. 29–37. Polytechnic Press, Brooklyn NY (1965) 76. Zadeh, L.A.: Fuzzy logic, neural networks, and soft computing. Communications of the ACM 37(3), 77–84 (1994) 77. Zadeh, L.A.: Toward human level machine intelligence–Is it achievable? In: Proc. 7th International Conference on Cognitive Informatics (ICCI 2008). IEEE CS Press, Stanford University, CA (2008) 78. Zajonc, R.B.: On the Primacy of Affect. American Psychologist 39, 117–123 (1984) 79. Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.): 14th International Symposium on Foundatation of Intelligent Systems ISMIS 2003, Maebashi City, Japan (2003)

Chapter 5

Optimal Differential Filter on Hexagonal Lattice Suguru Saito, Masayuki Nakajiama, and Tetsuo Shima Department of Computer Science, Tokyo Institute of Technology, Japan

Abstract. Digital two-dimensional images are usually sampled on square lattices, whose adjacent pixel distances in the horizontal-perpendicular and diagonal directions are not equal. On the other hand, a hexagonal lattice, however, covers an area with sampling points whose adjacent pixel distances are the same; therefore, it has te potential advantage that it can be used to calculate accurate two-dimensional gradient. The fundamental image filter in many image processing algorithms is used to extract the gradient information. For the extraction, various gradient filters have been proposed on square lattices, and some of them have been thoroughly optimized but not on a hexagonal lattice. In this chapter, consistent gradient filters on hexagonal lattices are derived, the derived filters are compared with existing optimized filters on square lattices, and the relationship between the derived filters and existing filters on a hexagonal lattice is investigated. The results of the comparison show that the derived filters on a hexagonal lattice achieve better signal-to-noise ratio and localization than filters on a square lattice.

1 Introduction Obtaining the differential image of a given input image is a fundamental operation in image processing. In most cases, the differential image is the result of a convolution of the input image with a differential filter. Accordingly, the more accurate the differential filter is, the better the convolution results will be. Many discrete differential filters[8, 13, 19, 18, 21] have been proposed; however, the gradient derived by them are not so accurate. Ando therefore proposed “consistent gradient filters,” which are optimized differential filters on a square lattice[2]. These filters are derived by minimizing the difference between the ideal differential and the differential obtained with filters in the frequency domain, and they have succeeded in obtaining more accurate differential values. On the other hand, image processing on hexagonal lattices has also been studied for many years. Fundamental research on hexagonal lattices, for example, research M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 59–87. c Springer-Verlag Berlin Heidelberg 2012 springerlink.com 

60

S. Saito, M. Nakajiama, and T. Shima

on signal processing[14], geometric transform[10], co-occurrence matrix[15], efficient algorithms for Fourier transforms[9], and FIR filter banks[11] has been reported. Moreover, image processing on hexagonal lattices, ranging from hardware to application algorithms [3, 4, 5, 12, 22, 24, 26, 27, 28] has also been widely researched. Moreover, a book that collects researches about image processing on hexagonal lattices has been published[16]. It is also known that the human eye receptors follow a hexagonal alignment, there are works related to human perception and image processing on hexagonal lattice. Overington[17] proposed lightweight image processing methods on hexagonal lattices. Gabor filters on hexagonal lattice, believed to be performed in the lower levels of the human vision system, have also been proposed[25]. In this chapter, consistent gradient filters on hexagonal lattices are described, adding to our paper [20]. The filters are derived on the basis of a previously proposed on square lattices [2]. The relationship between the derived filters and existing filters on a hexagonal lattice designed in another way is then discussed. After that, the derived filters are compared with conventional optimized filters on square lattices.

2 Preliminaries First, F(u, v) is taken as the Fourier transform of f (x, y): F(u, v) =

 ∞ ∞ −∞ −∞

f (x, y)e−2π i(ux+vy) dxdy

(1)

In this chapter, The pixel placement on hexagonal lattices and square lattices is defined as Figure 1. It is assumed that the input image contains frequency components only in the hexagonal region which is described in Figure 2. According to the sampling theorem, the impulse response of an image sampled by an hexagonal lattice is repeated and the repeating unit region is illustrated in Figure 2 ([7]).

3 Least Inconsistent Image Let the x-axis and the y-axis correspond to the horizontal axis (0◦ ) and the vertical axis (90◦ ) respectively. Let ha (x, y), hb (x, y), and hc (x, y) be elements of discrete gradient filters in the directions 0◦ , 60◦ , and 120◦ respectively, on hexagonal lattices, while the arguments (x, y) of them indicate a point in a traditional orthogonal coordinate system. The gradients of image f (x, y) in the directions of the filters, which are derived by the convolution, are denoted by fa (x, y), fb (x, y), fc (x, y), fa (x, y) = ha (x, y) ∗ f (x, y) =



x1 ,y1 ∈RH

ha (x1 , y1 ) f (x − x1 , y − y1 ) (2)

Optimal Differential Filter on Hexagonal Lattice

fb (x, y) = hb (x, y) ∗ f (x, y) =

fc (x, y) = hc (x, y) ∗ f (x, y) =

61



hb (x1 , y1 ) f (x − x1 , y − y1 ) (3)



hc (x1 , y1 ) f (x − x1 , y − y1 ), (4)

x1 ,y1 ∈RH

x1 ,y1 ∈RH

where RH is a set of pixels inside the filters. They are described in the frequency domain as follows:

(a) Hexagonal lattices

Fa (u, v) = Ha (u, v)F(u, v) Fb (u, v) = Hb (u, v)F(u, v)

(5) (6)

Fc (u, v) = Hc (u, v)F(u, v).

(7)

(b) Square lattices

Fig. 1 Hexagonal and square lattices. The √ distance between adjacent pixels is always 1 on hexagonal lattices, while it is either 1 or 2 on square lattices.

The least-inconsistent image of discrete gradient images fa (x, y), fb (x, y), and fc (x, y) is denoted as g(x, y). In a similar manner to Ando’s definition of g for square lattices, g(x, y) is determined by minimizing the following criterion: 2  √ 2   ∂   1 ∂ 3 ∂   g(x, y) − fa (x, y) +  + g(x, y) − fb (x, y)    2 ∂x  ∂x 2 ∂y −∞

 ∞ ∞ −∞

 2

√   1 ∂ 3 ∂   + g(x, y) − fc (x, y) dxdy. (8) + −   2 ∂x 2 ∂y

62

S. Saito, M. Nakajiama, and T. Shima

In the frequency domain, by using Parseval’s theorem, and by considering that the same shape of spectra appears repeatedly due to the sampling theorem, the problem is transformed into the minimization of the following on a single unit of spectra: 

 D

 2  √   3 1   u+ v G(u, v) − Fb(u, v) |2π iuG(u, v) − Fa(u, v)|2 + 2π i   2 2  

 √  2 1 3   v G(u, v) − Fc(u, v) dudv, (9) + 2π i − u +   2 2

where  D

•dudv ≡ +

 1/2  −1/2√3 

√ •dudv √ v/ 3−1/ 3 √ √ 1/2  −v/ 3+1/ 3

0

0

√ 1/2 3

+

•dudv +

 0



0

 −1/2√3

√ √ •dudv −1/2 −v/ 3−1/ 3 √ √  1/2  1/2√3 v/ 3+1/ 3

√ −1/2 1/2 3

•dudv +

√ •dudv. −1/2 −1/2 3

(10)

This area D is represented in Figure 2. The integrand of (9) is expanded as follows (where •∗ denotes the complex conjugate of •):

Fig. 2 Hexagonal region to be integrated

Optimal Differential Filter on Hexagonal Lattice

4π 2 u2 GG∗ − 2π iuGFa∗ + 2π iuG∗Fa + FaFa∗   √ 2 √ 1 1 3 3 + 4π 2 u+ v GG∗ − 2π i u+ v GFb∗ 2 2 2 2  √ 1 3 u+ v G∗ Fb + FbFb∗ + 2π i 2 2   √ 2 √ 3 3 1 1 2 ∗ v GG − 2π i − u + v GFc∗ + 4π − u + 2 2 2 2  √ 3 1 v G∗ Fc + Fc Fc∗ + 2π i − u + 2 2

63

(11)

In a similar manner to Ando’s method, differentiating with respect to G and G∗ gives the following conditions for g(x, y) to be the least inconsistent gradient image. √ √ 1 1 3 3 6π 2 (u2 + v2 )G∗ − 2π iuFa∗ − 2π i( u + v)Fb∗ − 2π i(− u + v)Fc∗ = 0 (12) 2 2 2 2 √ √ 3 3 1 1 6π (u + v )G + 2π iuFa + 2π i( u + v)Fb + 2π i(− u + v)Fc = 0 (13) 2 2 2 2 2

2

2

The sum and difference of these two expressions are given respectively as follows: 6π 2 (u2 + v2 )(G + G∗) + 2π iu(Fa − Fa∗ ) √ √ 3 3 1 1 ∗ v)(Fb − Fb ) + 2π i(− u + v)(Fc − Fc∗ ) = 0 (14) + 2π i( u + 2 2 2 2 6π 2 (u2 + v2 )(G − G∗) + 2π iu(Fa + Fa∗ ) √ √ 3 3 1 1 ∗ v)(Fb + Fb ) + 2π i(− u + v)(Fc + Fc∗ ) = 0 (15) + 2π i( u + 2 2 2 2 Equation (14) consists only of real parts, while Equation (15) consists of imaginary terms only. For g to be least inconsistent, both equations must equal zero. Summing these two expressions, gives the following condition: G(u, v) =

−i G1 (u, v), 3π (u2 + v2)

(16)

64

S. Saito, M. Nakajiama, and T. Shima

where √ √ 3 3 1 1 v)Hb (u, v) + (− u + v)Hc (u, v))F(u, v). G1 (u, v) ≡ (uHa (u, v) + ( u + 2 2 2 2 (17) (16) gives the condition on g for (9) to be minimal. Errors of gradient filters are attributed to the inconsistency and to the smoothing effect. Applying a gradient filter on discrete image means that the resultant gradient value is actually smoothed, as the gradient filter for discrete images is not equal to the gradient defined on the continuous domain. As Ando described[2], the smoothing effect of gradient filters is not important in comparison to the inconsistency. Similarly, in the present derivation, the smoothing effect is therefore, also discarded. To reduce the inconsistency, the minimization of (9) is thus targeted.

4 Point Spread Function The aim of this chapter is to derive gradient filters for the three orientations on the hexagonal lattice: 0◦ , 60◦ and 120◦ . It is supposed that the gradient filters for 60◦ and 120◦ are obtained by rotating the gradient filter derived for 0◦ . It is also supposed that the gradient filter for 0◦ is symmetric with respect to the x-axis and antisymmetric with respect to the y-axis (See Figure 4). And amn is taken as a discrete coefficient of a gradient filter. Using a δ -function as the common impulse function, makes it possible to write a set of elements of a gradient filter as follows. mn mn The point spread functions hmn a , hb and hc , which define elements of gradient ◦ ◦ filters in the respective directions of 0 , 60 and 120◦ are described as hmn a (x, y)

√ √ 3n 3n m m ) + δ (x + , y − ) = amn {−δ (x − , y − 2 4 2 4 √ √ 3n 3n m m − δ (x − , y + ) + δ (x + , y + )}, (18) 2 4 2 4 √ √ 3 3n 2 (− 4 ), y + √ √ x + 12 ( m2 ) − 23 (− 43n ), y + √ √ x + 12 (− m2 ) − 23 ( 43n ), y + √ √ x + 12 ( m2 ) − 23 ( 43n )}, y +

1 m hmn b (x, y) = amn {−δ (x + 2 (− 2 ) −

+δ ( −δ ( +δ (

√ √ 3 m 1 3n 2 (− 2 ) + 2 (− 4 )) √ √ 3 m 3n 1 2 ( 2 ) + 2 (− 4 )) √ √ 3 3n m 1 2 (− 2 ) + 2 ( 4 )) √ √ 3 m 3n 1 2 ( 2 ) + 2 ( 4 ))}

(19)

Optimal Differential Filter on Hexagonal Lattice

65

Fig. 3 Positions of amn of hmn a where the distance between the center and any element is less than or equal to 1. Note that the elements on the horizontal axis are overlapped. The resultant derived filter with radius of 1 is shown in Figure 4(a).

and √ √ √ √ 3 3n 3 3n m 1 2 (− 4 ), y + 2 (− 2 ) − 2 (− 4 )) √ √ √ √ x − 12 ( m2 ) − 23 (− 43n ), y + 23 ( m2 ) − 12 (− 43n )) √ √ √ √ x − 12 (− m2 ) − 23 ( 43n ), y + 23 (− m2 ) − 12 ( 43n )) √ √ √ √ x − 12 ( m2 ) − 23 ( 43n ), y + 23 ( m2 ) − 12 ( 43n ))},

m 1 hmn c (x, y) = amn {−δ (x − 2 (− 2 ) −

+δ ( −δ ( +δ (

(20)

where m = 0, 1, 2, . . ., n = (m%2)(2k + 1) + ((m + 1)%2)2k(k = 0, 1, 2, . . . ) and the δ -function is a common impulse function. Point spread function hmn a of the gradient filter with a radius of 1, which means that every element of the filter is located at a distance from the center of the filter less than or equal to 1, is illustrated in Figure 3. a , η b and η c , are defined as To simplify notations, three functions ηmn mn mn √ 3 a nv) (21) ηmn (u, v)≡sin(π mu) cos(π √2 √ 3 3 1 3 b v)) · cos(π n( u − v)) (22) ηmn (u, v)≡sin(π m( u + 2 2√ 4 4√ 3 3 1 3 c v)) · cos(π n( u + v)) (23) ηmn (u, v)≡sin(π m(− u + 2 2 4 4 The Fourier transforms of ha , hb and hc are described by using η as follows: a (u, v) Ha (u, v) = ∑ Hamn (u, v) = 4i ∑ amn ηmn

(24)

b Hb (u, v) = ∑ Hbmn (u, v) = 4i ∑ amn ηmn (u, v)

(25)

c Hc (u, v) = ∑ Hcmn (u, v) = 4i ∑ amn ηmn (u, v)

(26)

m,n

m,n

m,n

m,n

m,n

m,n

66

S. Saito, M. Nakajiama, and T. Shima

where a (u, v) Hamn (u, v)=4amn iηmn

(27)

b Hbmn (u, v)=4amn iηmn (u, v) mn c Hc (u, v)=4amn iηmn (u, v).

(28) (29)

To ease the numerical optimization, the condition (16) is transformed. That is, Function G(u, v) is rewritten as G(u, v) =

−i G1 (u, v), 3π (u2 + v2)

(30)

where √ √ 3 3 1 1 v)Hb (u, v) + (− u + v)Hc (u, v))F(u, v) G1 (u, v) ≡ (uHa (u, v) + ( u + 2 2 2 2 = 4i ∑ amn σmn (u, v)F(u, v) (31) m,n

and 1 a σmn (u, v) ≡ uηmn (u, v) + ( u + 2

√ √ 3 3 1 b c v)ηmn (u, v) + (− u + v)ηmn (u, v). (32) 2 2 2

The expression to be minimized, Equation (9), is then rewritten as  D

where

Ψ (u, v)|F(u, v)|2 dudv

a b c Ψ (u, v)=16 ∑ ∑ akl amn (τkla τmn + τklb τmn + τklc τmn ) ,

(33)

(34)

k,l m,n

and 2u a σmn (u, v) − ηmn (u, v) 3(u2 + v2) √ u + 3v b b τmn (u, v)≡ 2 σmn (u, v) − ηmn (u, v) 3(u + v2) √ −u + 3v c c τmn (u, v)≡ σmn (u, v) − ηmn (u, v). 3(u2 + v2) a τmn (u, v)≡

(35) (36) (37)

Optimal Differential Filter on Hexagonal Lattice

67

5 Condition for Gradient Filter The condition for determining the amplitude of the gradient filters is obtained next. The following expression represents the gradient of function k(x, y) in direction of θ .   ∂ ∂ + sin θ cos θ k(x, y) (38) ∂x ∂y In the Fourier domain, this equation is transformed to 2π i (u cos θ + v sin θ ) K(u, v).

(39)

As ha (x, y), hb (x, y) and hc (x, y) are gradient filters in the 0◦ , 60◦ and 120◦ directions, the following conditions hold: Ha (u, v)=2π iu 

√ 3 1 u+ v Hb (u, v)=2π i 2 2  √ 3 1 Hc (u, v)=2π i − u + v . 2 2

(40) (41) (42)

It thus follows that

∑ Hamn (u, v)=2π iu

m,n

√ 3 1 u+ v ∑ 2 2 m,n  √ 1 3 ∑ Hcmn (u, v)=2π i − 2 u + 2 v , m,n

(43)



Hbmn (u, v)=2π i

(44) (45)

must hold for any u and v. By taking the limit as |u|, |v| → 0 and using first-order approximations for the trigonometric functions, makes it possible to rewrite these expressions as 4i ∑ amn π mu = 2π iu (46) m,n

√ √ 1 1 3 3 v) = 2π i( u + v) 4i ∑ amn π m( u + 2 2 2 2 m,n √ √ 3 3 1 1 v) = 2π i(− u + v). 4i ∑ amn π m(− u + 2 2 2 2 m,n

(47)

(48)

68

S. Saito, M. Nakajiama, and T. Shima

These three expressions are respectively equivalent to 2 ∑ mamn = 1.

(49)

m,n

6 Numerical Optimization The goal here to minimize (33) under the derived condition (49). |F(u, v)|2 in (33) is replaced with its ensemble average P(u, v) in a similar manner to that reported by Ando as follows (50) P(u, v) = E[|F(u, v)|2 ]. That is, Equation (33), repeated below as 

Ψ (u, v)|F(u, v)|2 dudv,

(51)

Ψ (u, v)P(u, v)dudv = 16 ∑ ∑ akl amn Rkl,mn

(52)

D

is rewritten as  D

where Rkl,mn ≡

k,l m,n

 D

a b c (τkla τmn + τklb τmn + τklc τmn )P(u, v)dudv.

(53)

Our objective is now to minimize J0 ≡ 16 ∑ ∑ akl amn Rkl,mn

(54)

k,l m,n

under the condition (49). The optimal values of amn are computed by a traditional gradient-descent optimization as follows. Taking i as the number of steps of optimization, gives J0 ≡ 16 ∑ ∑ akl amn Rkl,mn . (i)

(i) (i)

(55)

k,l m,n

Differentiating this expression, gives



(i) J (i) 0 ∂ amn

= 16 ∑ akl Rkl,mn . (i)

(56)

k,l

To satisfy condition (49), the values are updated at each iteration as follows: (i)

atmp mn = amn − α



(i) J , (i) 0 ∂ amn

(57)

Optimal Differential Filter on Hexagonal Lattice (i+1)

amn

=

atmp mn 2 ∑m,n matmp mn

69

(58)

where α is a constant and atmp mn is a temporary variable. The minimization stops when all amn that compose the gradient filter satisfy the following condition. (i+1)

|amn

(i)

− amn | < ε ,

(59)

where ε is a constant. Under the assumption that P(u, v) = 1, the actual values of the gradient filter are obtained, as Ando described on square lattices. This assumption means that the input image’s spectrum is equivalent to white noise, so the derived filters are not specialized for a specific frequency. The optimization was performed with Octave[1] with α = 0.01 and ε = 10−12.

a = 0.333011 b = 0.166989 (a) Hex 1: Filter of radius 1

a = 0.272665 b = 0.136340 c = 0.030332 √ (b) Hex 3: Filter of √ radius 3

a = 0.182049 b = 0.091025 c = 0.050753 d = 0.024889 e = 0.012444 (c) Hex 2: Filter of radius 2 Fig. 4 Forms of derived gradient filters on hexagonal lattices

70

S. Saito, M. Nakajiama, and T. Shima

The resulting consistent gradient filters for the 0◦ , 60◦ , 120◦ directions are given in Figure 4.

7 Theoretical Evaluation 7.1 Signal-to-Noise Ratio In the evaluation of signal-to-noise ratio, first, fi (x, y) and f j (x, y) are taken as discrete gradient images in the x and y directions on square lattices, respectively, and Fi (u, v) and Fj (u, v) are taken as their Fourier transforms, respectively. Next, gsqr (x, y) is taken as the least inconsistent image of fi (x, y) and f j (x, y), and Gsqr (u, v) are taken as its Fourier transform. Ando[2] transformed the error  2  ∂ sqr   g (x, y) − fi (x, y)   ∂x −∞

 ∞ ∞ −∞

+

 2

 ∂ sqr   g (x, y) − f j (x, y) dxdy (60) ∂y 

by applying Parseval’s theorem and defined inconsistency as J sqr =

 1/2  1/2 −1/2 −1/2

|2π uiGsqr (u, v) − Fi(u, v)|2  2  + 2π viGsqr (u, v) − Fj (u, v) dudv (61)

where the domain of integration is a unit of repeated spectra due to the sampling theory. The gradient intensity on square lattices is defined as J1sqr =

 1/2  1/2  −1/2 −1/2

 sqr 2  2   dudv, |Gsqr (u, v)| + G (u, v) x y

(62)

sqr sqr sqr where Gsqr x (u, v) and Gy (u, v) are the Fourier transforms of gx (x, y) and gy (x, y), which are the partial differentials of the least inconsistent image in the x and y directions, respectively, on square lattices. Similarly,

 2  ∂  g(x, y) − 2 ( fa (x, y) + 1 ( fb (x, y) − fc (x, y)))   ∂x 3 2 −∞

 ∞ ∞ −∞

 2

∂  1  +  g(x, y) − √ ( fb (x, y) + fc (x, y)) dxdy (63) ∂y 3

Optimal Differential Filter on Hexagonal Lattice

71

is transformed by using Parseval’s theorem to give  2   2π iuG(u, v) − 2 (Fa (u, v) + 1 (Fb (u, v) − Fc(u, v)))   3 2 −∞

 ∞ ∞ −∞

 2

  1  + 2π ivG(u, v) − √ (Fb (u, v) + Fc(u, v)) dudv. (64) 3

Inconsistency on hexagonal lattices is therefore defined as       G˜ x (u, v)2 + G˜ y (u, v)2 dudv, J hex ≡

(65)

D

where 2 1 G˜ x (u, v) ≡ 2π iuG(u, v) − (Fa (u, v) + (Fb (u, v) − Fc(u, v))) 3 2

(66)

1 G˜ y (u, v) ≡ 2π ivG(u, v) − √ (Fb (u, v) + Fc(u, v)). 3

(67)

And, gradient intensity J1hex is defined as follows. J1hex ≡

 D

(|Gx (u, v)|2 + |Gy (u, v)|2 )dudv =

 D

(4π 2 (u2 + v2 )|G(u, v)|2 )dudv.

The following expressions are defined to simplify the above expressions. 1 2 1 b b c a c ψmn ≡ √ u(ηmn + ηmn ) − v(ηmn + (ηmn − ηmn )) 3 2 3

(68)

2 1 b 1 a c b c φmn ≡ u(ηmn + (ηmn − ηmn )) + √ v(ηmn + ηmn ) 3 2 3

(69)

g˜x (x, y) ≡

2 1 ∂ g(x, y) − ( fa (x, y) + ( fb (x, y) − fc (x, y))) ∂x 3 2

(70)

1 ∂ g(x, y) − √ ( fb (x, y) + fc (x, y)) ∂y 3

(71)

g˜y (x, y) ≡

|G˜ x (u, v)|2 + |G˜ y (u, v)|2 |F(u, v)|2 1 1 2 1 = 2 | √ u(Hb (u, v) + Hc (u, v)) − v(Ha (u, v) + (Hb (u, v) − Hc (u, v)))|2 3 2 u + v2 3 16 = 2 ( ∑ amn ψmn )2 (72) u + v2 m,n

Ψ0 (u, v)≡

72

S. Saito, M. Nakajiama, and T. Shima

Φ (u, v)≡

|Gx (u, v)|2 + |Gy (u, v)|2 |F(u, v)|2

√ √ 4π 2 (u2 + v2 ) 1 1 3 3 = 2 2 |uHa (u, v) + ( u + v)Hb (u, v) + (− u + v)Hc (u, v)|2 2 2 2 2 9π (u + v2 )2 16 = 2 ( ∑ amn φmn )2 (73) u + v2 m,n

Inconsistency J hex thus is rewritten as 

J

hex

=

D

=



=



=

D

(|G˜ x (u, v)|2 + |G˜ y (u, v)|2 )dudv

Ψ0 (u, v)P(u, v)dudv 16

| ∑ amn ψmn |2 P(u, v)dudv

D

u 2 + v2

D

16 | ∑ amn ψmn |2 dudv. u2 + v2 m,n

m,n

(74)

And gradient intensity J1hex is rewritten as J1hex = = = =

 D



D



D



(|Gx (u, v)|2 + |Gy (u, v)|2 )dudv

Φ (u, v)P(u, v)dudv 16 { ∑ amn φmn (u, v)}2 P(u, v)dudv u2 + v2 m,n

16 { a φ (u, v)}2 dudv. 2 + v2 ∑ mn mn u D m,n

(75)

The ratio of J1 to J corresponds to signal-to-noise ratio (SNR), which is defined as follows: J1 1 (76) SNR ≡ log2 . 2 J SNR is used to compare the consistent gradient filters on square lattices[2] with the gradient filters on hexagonal lattices derived here. The results for J, J1 and SNR are listed in Table 1. First, the table shows that the values stated in [2] for the square lattices could be reproduced. The derived filters on hexagonal lattices also achieved higher J1 than similar size filters on square lattices. Since J1 is the integration of signal intensity for each frequency that the filter lets pass, our filters on hexagonal lattices are superior to similar size filters on square lattices with respect to frequency permeability.

Optimal Differential Filter on Hexagonal Lattice

73

Table 1 Properties of derived filters Filter Sqr 3 × 3 Sqr 4 × 4 Sqr 5 × 5

J .000778989 .000016162 .000000376

J1 .40310040 .18895918 .11585859

SNR 4.51 6.76 9.12

Num. o f Pixels 9 16 25

Hex 1 √ Hex 3 Hex 2

.001432490 .000017713 .000000086

.74197938 .51425309 .25649444

4.51 7.41 10.75

7 13 19

7.2 Localization How the elements of a filter are “balanced” is evaluated next. In particular, how close to the center of a filter are the element values spread is investigated here. If the element values gather close to the center, the filter focuses on the information close to the center. Localization on square lattices is defined as Locsqr ≡ where sqr = Pmn

sqr ∑m,n Pmn dm,n sqr , ∑m,n Pmn

 (hi (m, n))2 + (h j (m, n))2 ,

(77)

(78)

where hi (m, n) and h j (m, n) are the elements of the gradient filters derived in the x and y directions, respectively, and dm,n is distance from the center of the filter. Moreover, the localization on hexagonal lattices is defined as Lochex ≡

hex d ∑m,n Pmn m,n , hex ∑m,n Pmn

(79)

where hex Pmn

  2 2 1 = ha (m, n) + (hb (m, n) − hc (m, n)) 3 2 2 1/2 1 + √ (hb (m, n) + hc (m, n)) , (80) 3 

where ha (m, n), hb (m, n) and hc (m, n) are the elements of the gradient filters derived in the 0◦ , 60◦ and 120◦ directions, respectively. The resultant localizations are listed in Table 2 with resultant SNR, and they are plotted in Figure 5. It is clear that the smaller the filter is, the better its localization is. At the same time, the larger the filter is, the better the SNR is. Accordingly, there

74

S. Saito, M. Nakajiama, and T. Shima

Table 2 Localization and SNR Filter Sqr 3 × 3 Sqr 4 × 4 Sqr 5 × 5

Num. of Pixels 9 16 25

Loc 1.1522 1.2698 1.5081

SNR 4.51 6.76 9.12

Hex 1 √ Hex 3 Hex 2

7 13 19

1 1.0833 1.2553

4.51 7.41 10.75

12 10 SNR

8 6 4 2

Fig. 5 Filters on hexagonal lattices have better SNR despite localization increases in comparison to those on square lattices.

0 0.9

1

1.1 Sqr

1.2 1.3 Loc

1.4

1.5

1.6

Hex

is a trade-off between localization Loc and SNR. However, for a given size, filters derived on hexagonal lattices achieve better SNR-to-Loc ratio than filters derived on square lattices, as shown in Figure 5.

8 Experimental Evaluation 8.1 Construction of Artificial Images The consistent gradient filters on hexagonal and square lattices are compared as follows. Specifically, to get both gradient intensity and orientation at any point on the image analytically, artificial images defined by mathematical functions were constructed. The error between the ideal value and the value obtained from the filtered image were the measured. Since it was assumed in the derivation of the element values of the gradient filters that the frequency characteristics of the input image are close to white noise, images that present other frequency profiles for evaluation were constructed. The artificial input image is taken as f (x, y), and f int (x, y) and f ori (x, y) are taken as its ideal gradient intensity and its ideal orientation, respectively. fex1 , fex2 and fex3 are defined as follows.

Optimal Differential Filter on Hexagonal Lattice

75

fex1 is composed mostly of low frequencies. It has smooth changes in luminance given as:  fex1 (x, y) ≡

R21 − (x2 + y2 ).

The gradient intensity and orientation of this image are  x2 + y2 int fex1 (x, y) ≡ −  R21 − (x2 + y2 ) ori fex1 (x, y) ≡ arctan(x, y),

(81)

(82)

(83)

where R1 is a constant. fex2 is an image composed mostly of high frequencies. It has periodical changes in luminance given as:  (84) fex2 (x, y) ≡ A2 · cos2 (ω2 x2 + y2 ). Its gradient intensity and orientation are int fex2 (x, y) ≡ |A2 · ω2 · sin (2ω2

 x2 + y2 )|

ori fex2 (x, y) ≡ arctan(x, y),

(85) (86)

where A2 and ω2 are constants. fex3 is composed of low to high frequencies given as fex3 (x, y) ≡ A3 · cos2 (ω3 (x2 + y2 )).

(87)

Its gradient intensity and orientation are  int fex3 (x, y) ≡ 2|A3 · ω3 · sin (2ω3 (x2 + y2 ))| x2 + y2

(88)

ori fex3 (x, y) ≡ arctan(x, y),

(89)

where A3 and ω3 are constants. For a given input image f , Int( f ) is the gradient intensity computed from the filtered image and Ori( f ) is the orientation computed from the filtered image. To evaluate the accuracy of the filters, the errors are then calculated as follows: | f int − Int( f )|

(90)

| f ori − Ori( f )|.

(91)

and

76

S. Saito, M. Nakajiama, and T. Shima

8.2 Detection of Gradient Intensity and Orientation Ando’s gradient filters (3 × 3 and 5 × 5) for square lattices and the derived gradient filters (radius 1 and 2) for hexagonal lattices are evaluated as follows. The gradient intensity and orientation are determined from a filtered image as follows. The differential values in the x and y directions are taken as fx and fy respectively, on square lattices, fr , fs and ft are taken as the differential values in the 0◦ , 60◦ , and 120◦ directions, respectively, on hexagonal lattices. The functions for calculating the gradient intensity on the square and hexagonal lattices, respectively, are taken as Int sqr and Int hex as follows:  Int sqr ( f ) ≡ ( fx )2 + ( fy )2 (92)

Int

hex

  2  2 1/2 2 1 1 (f) ≡ + √ ( fs + ft ) fr + ( fs − ft ) 3 2 3

(93)

Similarly, the orientation is given as Orisqr ( f ) ≡ arctan( fx , fy )

hex

Ori

   2 1 1 ( f ) ≡ arctan fr + ( fs − ft ) , √ ( fs + ft ) . 3 2 3

(94)

(95)

Both functions rely on arctan, a function known for its high computational cost. In the following section, the accuracy of the orientation calculated by Overington’s method[17], which has a smaller calculation cost, is evaluated.

8.3 Overington’s Method of Orientation Detection Overington[17] proposed an orientation-detection method specially designed for hexagonal lattices. The method performs better in terms of calculation time than methods using arctangent, such as (94) and (95). While Overington focused on hexagonal lattices (six axes), his approach is extended here to any number of axes. First, it is supposed that n gradient filters are available in n directions, uniformly sampled. Each filter’s orientation is at angle π /n from its neighbors. The differential value in the direction θ f is taken as zθ f . Accordingly, the following equations are defined:

θ1f ≡argmaxθ f (|zθ f |)

(96)

θ0f ≡θ1f θ2f ≡θ1f

− π /n

(97)

+ π /n

(98)

Optimal Differential Filter on Hexagonal Lattice

77

z0 ≡zθ f

(99)

z1 ≡zθ f

(100)

z2 ≡zθ f

(101)

0 1 2

Overington assumed that differential values for three directions can be fitted by a trigonometric function, where the three directions consist of a direction that gives the highest absolute differential value and two other adjacent directions. That is, it is assumed that these equations can be fitted as follows. z0 =B cos(θ1 − π /n),

(102)

z1 =B cos θ1 , z2 =B cos(θ1 + π /n),

(103) (104)

θ1 ≡θ1f − θ p

(105)

where

and θ p is the orientation to be detected. The differences between (103) and (102), and between (103) and (104) give z1 − z0 =B(cos θ1 − cos(θ1 − π /n)),

(106)

z1 − z2 =B(cos θ1 − cos(θ1 + π /n)).

(107)

Dividing the sum of them by the difference of them gives tan θ1 =

z0 − z2 1 − cos(π /n) · . sin(π /n) 2z1 − z0 − z2

(108)

To reduce computation cost, the first term of Maclaurin expansion is used for tangent, that is tan θ1 ≈ θ1 .

θ1 =

z0 − z2 1 − cos(π /n) · sin(π /n) 2z1 − z0 − z2

(109)

The orientation θ p is detected as follows.

θ p = θ1f − θ1

(110)

As stated above, this method is good in situations where calculation time is more important than high accuracy, because it does not need to call for the time-consuming arctangent. Overington proposed this method for hexagonal lattices. For the sake of comparison, his method is also adopted here for square lattices, where n = 6 for both lattices were prepared. For hexagonal lattices, gradient filters in six directions were prepared. Filters for the 0◦ , 60◦ and 120◦ directions are derived in section 6. Filter

78

S. Saito, M. Nakajiama, and T. Shima

hd for the 90◦ direction is composed from the filters for the 60◦ and 120◦ directions as follows: 1 (111) hd ≡ √ (hb + hc ) . 3 The filters for the 30◦ and 150◦ directions are the obtained by rotating hd (Figure 6). In the same way, for square lattices, gradient filters for the 30◦ , 60◦ , 120◦ and 150◦ directions were prepared by composing Ando’s filters in the 0◦ and 90◦ directions.

a

0 -a

a

0 -a

-

0

0

0 a-

0

a-

-

0 a-

0 a-

0

a = 0.2886751 he

hd

hf

Fig. 6 Derived differential filters for 30, 90 and 150 degree from filters for 0, 60 and 120 degree

8.4 Relationship between Derived Filter and Staunton Filter The relationship between the filters derived here (whose radius is 1) and Staunton filters[23] is investigated as follows. Staunton designed a set of edge detecting filters as illustrated in Figure 7. He mentioned that the element values in these filters, which are 1 or −1, are nearly optimal according to Davies’ design principle[6]. This principle is based on a super sampling of a disc whose center is the center of a filter in the image domain. On the contrary, the derivation of the present filters is in frequency domain. The equations for detecting intensity and orientation from convolution values with the Staunton filters are described in the following. hex is given as Intensity IntStaunton 1 hex ≡√ IntStaunton 3



f p2a + f p2c + f pa f pc ,

(112)

where f pa and f pc are convolution values with filters pa and pc , respectively1 Orientation is given as 1 Orihex Staunton ≡ arctan( f pa + f pc , √ ( f pa − f pc )). 3 1

(113)

Since we use opposite signed pc , the sign of the third term differs from that in [23]. The constant √1 also differs because the present grid distance is 1, but it is √2 in [23]. 3

3

Optimal Differential Filter on Hexagonal Lattice

79

Since our hexagonal filter assumes that the element values are distributed as shown in Figure 4(a), these above-described Staunton filters are not derived. However, hd derived by (111), and he and h f derived by rotating hd give filters with the same shape as the Staunton filters, though the element values are not the same. On the other hand, filters that have the same proportion of elements as ha , hb and hc can be derived from pa , pb and pc by 1 pd = √ (pa + pb) , 3 1 pe = √ (pb + pc ) , 3 1 p f = √ (pc − pa ) . 3

(114) (115) (116)

Our equation for deriving gradient intensity (93) can be rewritten as

Int

hex

  2  2 12 2 1 1 ( f )= + √ ( fs + ft ) (117) fr + ( fs − ft ) 3 2 3   1  2   4 1 1 = fs2 + 2 fs ft + ft2 fr2 + fr fs − fr ft + ( fs2 − 2 fs ft + ft2 ) + 9 4 3  1 2 1 = (4 fr2 + 4 fs2 + 4 ft2 + 4 fr fs + 4 fs ft − 4 fr ft ) 9 1 2 2 = fr + fs2 + fr fs + ft ( ft + fs − fr ) 2 3 1 2 2 = ( fr + fs2 + fr fs ) 2 , 3

which is the same as (112) except for the constant coefficient. The gradient intensity detecting equation with Staunton filters can therefore detect gradient intensities with the same accuracy as our filters.

a

0 -a

a

0 -a

0 pa

-

0

0 a-

0

a-

a=1 pb

Fig. 7 Staunton filters[23].The original pc has opposite sign.

-

0 a-

0 a-

0 pc

80

S. Saito, M. Nakajiama, and T. Shima

The equation for the orientation derivation (113) also has the same meaning as (95). Filter pa + pc is equal to pb , therefore, it gives a perpendicular differential value. The element values of the filter, pa − pc , shown in Figure Figure 8 actually forms horizontal differential filter, however,√this norm is not the same as that of pb . Therefore (113) adopts the constant 1/ 3 to pa − pc to avoid an anisotropic error. Hence (113) has the same accuracy as (95) because the only difference between them is caused by existence or non-existence of a redundancy. The Staunton filters and the derived consistent gradient filters with radius of 1 thus have the same accuracy.

Fig. 8 Composed horizontal filter by pa − pc .

-1 -2

1 2

0 -1

1

8.5 Experiment and Results Java 1.5 was used for the experiments and BigDecimal class, which provides fixedpoint calculation, was used to avoid calculation error. StrictMath class was used for mathematical functions. Artificial images of 201 × 201 pixels on square lattices and 201 × 233 pixels on hexagonal lattices were generated to get similar area and shape. Pixels located at a distance less than 90 from the center of both lattices were taken into consideration. To construct the artificial test images, the previously defined functions with the following parameters were used: R1 = 1000 for fex1 , A2 = 255, ω2 = π /4 for fex2 , and A3 = 255, ω3 = π /(8 ∗ 90) for fex3 . The period of fex2 is 4, and the period of fex3 is 4 at the boundary of the evaluated region whose radius is 90. That is, to avoid any aliasing errors, only those artificial images well under the Nyquist frequency were sampled. These test images are shown in Figure 9. The experimental results are summarized as follows. For each image, the error statistics (mean, variance, maximum and minimum) are shown. Table 3 lists the results for fex1 , Table 4 lists the results for fex2 , and Table 5 lists the results for fex3 . Visual versions of these results are shown in Figure 10, Figure 11, and Figure 12. The error is mapped to the luminance value. To show the difference in the accuracy by four filters, the minimum and maximum errors are mapped to the luminance values of 0 and 255. As stated above, only pixels located at a distance less than 90 from the center are included in the evaluation. Pixels outside this area are mapped to a luminance value of 0.

Optimal Differential Filter on Hexagonal Lattice

(a) fex1

(b) fex2

81

(c) fex3

Fig. 9 Original test images on square lattices. Note that the luminance of f ex1 varies slowly.

Table 3 Errors for fex1 , R1 = 1000 Filter

Mean

Variance Max Min Gradient intensity Sqr 3 × 3 4.39986E-8 2.47255E-16 6.64841E-8 7.25474E-10 Sqr 5 × 5 3.33738E-7 1.40526E-14 5.02276E-7 5.53689E-9 Hex 1 3.03035E-8 1.17287E-16 4.58079E-8 5.00000E-10 Hex 2 5.78494E-8 4.12577E-16 8.64151E-8 9.71484E-10 Orientation by arctan (radian) Sqr 3 × 3 1.15054E-10 8.60308E-21 3.60189E-10 0.00000E0 Sqr 5 × 5 1.72864E-11 1.93587E-22 5.41087E-11 0.00000E0 Hex 1 4.18671E-13 3.32438E-25 1.11029E-11 0.00000E0 Hex 2 2.45733E-13 1.12628E-25 6.12340E-12 0.00000E0 Orientation by Overington’s method (radian) Sqr 3 × 3 1.52959E-3 3.05383E-6 6.15061E-3 5.61363E-15 Sqr 5 × 5 1.52959E-3 3.05383E-6 6.15061E-3 5.61363E-15 Hex 1 1.52469E-3 3.01695E-6 6.14350E-3 0.00000E0 Hex 2 1.52469E-3 3.01695E-6 6.14350E-3 0.00000E0

Moreover, the means of error of gradient intensity, orientation detection (using arctangent) and orientation detection (by Overington’s method) are shown in Figure 13(a),(b) and (c), respectively.

9 Discussion Generally speaking, the computational cost of extracting the gradient value of a pixel on a hexagonal lattice is higher than that for a square lattice, because hexagonal lattices use three axes, while square lattices use two axes. However, the results show our filters on hexagonal lattices have many advantages with respect to accuracy. The most accurate results for gradient intensity were obtained when the radius1 gradient filter on hexagonal lattices was used for all test images. The smaller the

82

S. Saito, M. Nakajiama, and T. Shima

Table 4 Errors for fex2 , ω2 = π /4, A2 = 255 Filter

Mean

Variance Max Min Gradient intensity Sqr 3 × 3 4.78122E1 6.04649E2 1.03478E2 1.36418E-1 Sqr 5 × 5 7.96775E1 1.67478E3 1.61632E2 5.08181E-1 Hex 1 3.54782E1 3.00115E2 7.64858E1 1.51197E-1 Hex 2 6.13548E1 8.97091E2 1.27053E2 3.68400E-1 Orientation by arctan (radian) Sqr 3 × 3 2.48541E-2 4.62197E-4 3.12839E-1 0.00000E0 Sqr 5 × 5 2.36221E-3 3.37087E-5 7.15883E-2 0.00000E0 Hex 1 3.35539E-3 5.45738E-5 1.18910E-1 0.00000E0 Hex 2 3.67423E-4 5.04084E-7 9.71053E-3 0.00000E0 Orientation by Overington’s method (radian) Sqr 3 × 3 2.46246E-2 4.76264E-4 3.12654E-1 5.61363E-15 Sqr 5 × 5 3.19489E-3 3.44523E-5 7.43153E-2 5.61363E-15 Hex 1 3.67976E-3 6.23404E-5 1.19508E-1 0.00000E0 Hex 2 1.59309E-3 3.31283E-6 1.31418E-2 0.00000E0

value of localization is, the smaller the mean error of gradient intensity is. This result is expected, since the analytical gradient is defined as the first partial derivatives, and the derivative is defined as the limit of the difference quotient. For applications where a precise intensity is pursued, it is recommended to use the derived gradient filter with radius of 1 on hexagonal lattices. For orientation detection using arctangent, the larger the value of localization is, the smaller the mean error is, regardless of the image. This result is consistent with the result presented in section 7; namely, the larger the filter size is, the larger the theoretical SNR becomes. Moreover, the derived gradient filters on hexagonal lattices showed smaller mean errors in comparison with the consistent gradient filters on square lattices. For better orientation detection, it is concluded that the derived gradient filters on hexagonal lattices (with higher localization) are more appropriate. The errors in orientation detection using Overington’s method become smaller as the value of localization of the filter gets larger. The same holds for orientation detection using arctangent. Filters on hexagonal lattices perform better than the square ones for fex2 and fex3 . For fex1 where luminance varies slowly, the results of errors are similar for both types of lattices, however, the results for hexagonal lattices are better than those for square ones. Detecting the orientation of the gradient on hexagonal lattices with Overington’s method performs better than detecting the orientation on square lattices using arctangent for fex2 which mainly consists of high frequency components and fex3 which consists of low to high frequency components. For fex1 which mainly consists of low frequency components, this is not the case. The main advantage of using Overington’s method might be low computational cost, since it does not need to call for arctangent. As a result, using Overington’s method with the derived filters on hexagonal lattices has advantages to using arctangent with gradient filters on square

Optimal Differential Filter on Hexagonal Lattice

83

Table 5 Errors for fex3 , ω3 = π /(8 · 90), A3 = 255 Filter

Mean

Variance Max Min Gradient intensity Sqr 3 × 3 2.02389E1 3.60586E2 7.85115E1 2.91073E-4 Sqr 5 × 5 3.61619E1 1.05073E3 1.27089E2 1.04375E-4 Hex 1 1.45449E1 1.88399E2 5.56746E1 2.46549E-4 Hex 2 2.64606E1 5.85607E2 9.59810E1 7.97684E-3 Orientation by arctan (radian) Sqr 3 × 3 1.17490E-2 9.75118E-5 8.30716E-2 0.00000E0 Sqr 5 × 5 1.20048E-3 3.78492E-6 3.03901E-2 0.00000E0 Hex 1 8.01422E-4 8.44267E-7 3.98812E-3 0.00000E0 Hex 2 1.11561E-4 2.05645E-8 1.69539E-3 0.00000E0 Orientation by Overington’s method (radian) Sqr 3 × 3 1.17166E-2 1.02811E-4 8.40553E-2 5.61363E-15 Sqr 5 × 5 2.14414E-3 5.88746E-6 3.01349E-2 5.61363E-15 Hex 1 1.79184E-3 3.83489E-6 1.01309E-2 0.00000E0 Hex 2 1.53078E-3 3.02628E-6 6.48015E-3 0.00000E0

lattices, with respect to accuracy, computational cost, and simplicity of circuit implementation. For applications where high accuracy in gradient intensity or orientation detection is needed, the filters derived in this chapter for hexagonal lattices are good solutions.

10 Summary Consistent gradient filters on hexagonal lattices were derived, the relationship between Staunton filters and the derived filters was investigated, and the derived filters were compared with existing filters on square lattices theoretically and practically. Staunton filters and the derived consistent gradient filters with radius of 1 have a relationship that 30 degrees rotation make others, though the Staunton filters were derived on image domain but the derived filters are derived on the frequency domain. Thus both filters can calculate intensities and orientations with the same accuracy. In a theoretical evaluation of the derived filters, these show better SNR in spite of their smaller localization, in comparison to the consistent gradient filters on square lattices. In the theoretical evaluation, an assumption that the frequency characteristics of the input image is flat was adopted. To cope with this, the derived filters are evaluated with artificial images that present various frequency characteristics. The derived filters on hexagonal lattices achieve a higher accuracy in terms of gradient intensity and orientation detection in comparison to the consistent gradient filters on square lattices. Moreover, the derived filters on hexagonal lattices with Overington’s method can detect the orientation with higher accuracy than gradient filters

84

S. Saito, M. Nakajiama, and T. Shima

(a) Sqr3 × 3

(b) Sqr 5 × 5

(c) Hex 1

(d) Hex 2

(e) Sqr 3 × 3

(f) Sqr 5 × 5

(g) Hex 1

(h) Hex 2

(i) Sqr 3 × 3

(j) Sqr 5 × 5

(k) Hex 1

(l) Hex 2

Fig. 10 Results for fex1 . Errors of gradient intensity for four images are mapped to luminance values in (a), (b), (c) and (d). Errors of orientation detection (arctangent) for four images are mapped to luminance value as (e), (f), (g) and (h). Errors of orientation detection (Overington) for four images are mapped to luminance values as (i), (j), (k) and (l).

(a) Sqr 3 × 3

(b) Sqr 5 × 5

(c) Hex 1

(d) Hex 2

(e) Sqr 3 × 3

(f) Sqr 5 × 5

(g) Hex 1

(h) Hex 2

(i) Sqr 3 × 3

(j) Sqr 5 × 5

(k) Hex 1

(l) Hex 2

Fig. 11 Results for fex2 . Errors of gradient intensity for four images are mapped to luminance values in (a), (b), (c) and (d). Errors of orientation detection (arctangent) for four images are mapped to luminance value as (e), (f), (g) and (h). Errors of orientation detection (Overington) for four images are mapped to luminance values as (i), (j), (k) and (l).

Optimal Differential Filter on Hexagonal Lattice

85

(a) Sqr 3 × 3

(b) Sqr 5 × 5

(c) Hex 1

(d) Hex 2

(e) Sqr 3 × 3

(f) Sqr 5 × 5

(g) Hex 1

(h) Hex 2

(i) Sqr 3 × 3

(j) Sqr 5 × 5

(k) Hex 1

(l) Hex 2

0.025

0.025

0.02

0.02 Mean of error

80 70 60 50 40 30 20 10 0

Mean of error

Mean of error

Fig. 12 Results for fex3 . Errors of gradient intensity for four images are mapped to luminance values in (a), (b), (c) and (d). Errors of orientation detection (arctangent) for four images are mapped to luminance values as (e), (f), (g) and (h). Errors of orientation detection (Overington) for four images are mapped to luminance values as (i), (j), (k) and (l).

0.015 0.01 0.005

fex1

fex2

Sqr 3x3 Sqr 5x5

fex3 Hex 1 Hex 2

(a)

0

0.015 0.01 0.005

fex1

fex2

Sqr 3x3 Sqr 5x5

fex3 Hex 1 Hex 2

(b)

0

fex1

fex2

Sqr 3x3 Sqr 5x5

fex3 Hex 1 Hex 2

(c)

Fig. 13 Means of errors of Table 3, Table 4 and Table 5: (a) gradient intensity, (b) orientation detection by arctangent, and (c) orientation detection by Overington’s method.

on square lattices with arctangent, while being computationally lighter. The derived filters thus reduce calculation time and to simplifies circuit implementation. The standard image processing framework is still, however, to use square lattices. We hope that, in the near future, hexagonal lattices will become widely adopted and give better results when used with the derived gradient filters.

86

S. Saito, M. Nakajiama, and T. Shima

References 1. Octave, http://www.gnu.org/software/octave/ 2. Ando, S.: Consistent gradient operators. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(3), 252–265 (2000) 3. Balakrishnan, M., Pearlman, W.A.: Hexagonal subband image coding with perceptual weighting. Optical Engineering 32(7), 1430–1437 (1993) 4. Chettir, S., Keefe, M., Zimmerman, J.: Obtaining centroids of digitized regions using square and hexagonal tilings for photosensitive elements. In: Svetkoff, D.J. (ed.) Optics, Illumination, and Image Sensing for Machine Vision IV, SPIE Proc., vol. 1194, pp. 152– 164 (1989) 5. Choi, K., Chan, S., Ng, T.: A new fast motion estimation algorithm using hexagonal subsampling pattern and multiple candidates search. In: International Conference on Image Processing, vol. 1, pp. 497–500 (1996) 6. Davies, E.: Circularity – a new principle underlying the design of accurate edge orientation operators. Image and Vision Computing 2(3), 134–142 (1984) 7. Dubois, E.: The sampling and reconstruction of time-varying imagery with application in video systems. Proceedings of the IEEE 73(4), 502–522 (1985) 8. Frei, W., Chen, C.C.: Fast boundary detection: A generalization and a new algorithm. IEEE Transactions on Computers C-26(10), 988–998 (1977); doi:10.1109/TC.1977.1674733 9. Grigoryan, A.M.: Efficient algorithms for computing the 2-d hexagonal fouriertransforms. IEEE Transactions on Signal Processing 50(6), 1438–1448 (2002) 10. Her, I.: Geometric transformations on the hexagonal grid. IEEE Transactions on Image Processing 4(9), 1213–1222 (1995) 11. Jiang, Q.: Fir filter banks for hexagonal data processing. IEEE Transactions on Image Processing 17(9), 1512–1521 (2008); doi:10.1109/TIP.2008.2001401 12. Kimuro, Y., Nagata, T.: Image processing on an omni-directional view using a spherical hexagonal pyramid: vanishing points extraction and hexagonal chain coding. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems 1995. Human Robot Interaction and Cooperative Robots, Pittsburgh, PA, vol. 3, pp. 356–361 (1995) 13. Kirsch, R.A.: Computer determination of the constituent structure of biological images. Computers and Biomedical Research 4, 315–328 (1971) 14. Mersereau, R.M.: The processing of hexagonally sampled two-dimensional signals. Proceedings of The IEEE 67(6), 930–953 (1979) 15. Middleton, L.: The co-occurrence matrix in square and hexagonal lattices. In: ICARCV 2002, vol. 1, pp. 90–95 (2002) 16. Middleton, L., Sivaswamy, J.: Hexaonal Image Processing: A Practical Approach. Springer, Heidelberg (2005) 17. Overington, I.: Computer Vision: a unified, biologically-inspired approach. Elsevier Science Pub. Co., Amsterdam (1992) 18. Prewitt, J.M., Mendelsohn, M.L.: The analysis of cell images. Advances in Biomedical Computer Applications 128, 1035–1053 (1996); doi:10.1111/j.17496632.1965.tb11715.x 19. Roberts, L.G.: Machine Perception Of Three-Dimensional Solids. Massachusetts Institute of Technology lexington lincoln lab (1963) 20. Shima, T., Saito, S., Nakajima, M.: Design and evaluation of more accurate gradient operators on hexagonal lattices. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(6), 961–973 (2010)

Optimal Differential Filter on Hexagonal Lattice

87

21. Sobel, I.: Camera models and Machine perception. Stanford University computer science (1970) 22. Staunton, R.: A one pass parallel hexagonal thinning algorithm. In: Seventh International Conference on Image Processing and Its Applications 1999, vol. 2, pp. 841–845 (1999) 23. Staunton, R.C.: The design of hexagonal sampling structures for image digitization and their use with local operators. Image and Vision Computing,162–166 (1989) 24. Staunton, R.C., Storey, N.: Comparison between square and hexagonal sampling methods for pipeline image processing. In: Svetkoff, D.J. (ed.) SPIE Proc. Optics, Illumination, and Image Sensing for Machine Vision IV, vol. 1194, pp. 142–151 (1989) 25. Thiem, J., Hartmann, G.: Biology-inspired design of digital gabor filters upon a hexagonal sampling scheme. In: 15th International Conference on Pattern Recognition, vol. 3, pp. 445–448 (2000) 26. Tremblay, M., Dallaire, S., Poussart, D.: Low level segmentation using cmos smart hexagonal image sensor. In: Proceedings of Computer Architectures for Machine Perception, CAMP 1995, pp. 21–28 (1995) 27. Ulichney, R.: Digital Halftoning. MIT Press (1987) 28. Ville, D.V.D., Blu, T., Unser, M., Philips, W., Lemahieu, I., de Walle, R.V.: Hex-splines: a novel spline family for hexagonal lattices. IEEE Transactions on Image Processing 13(6), 758–772 (2004)

Chapter 6

Graph Image Language Techniques Supporting Advanced Classification and Cognitive Interpretation of CT Coronary Vessel Visualizations Mirosław Trzupek AGH University of Science and Technology Faculty of Electrical Engineering, Automatics, Computer Science and Electronics Department of Automatics al. A. Mickiewicza 30 30-059 Krakow

Abstract. The aim of this chapter is to present a graph image language techniques to the development of a syntactic semantic description of spatial visualizations of coronary artery system. The proposed linguistic description makes it possible to intelligently model the examined structure and then to advanced classification and cognitive interpretation of coronary arteries (automatically find the locations of significant stenoses and identify their morphometric diagnostic parameters). This description will be correctly formalised using ETPL(k) (Embedding Transformation-preserved Production-ordered k-Left nodes unambiguous) graph grammars, supporting the search for stenoses in the lumen of arteries forming parts of the coronary vascularisation. ETPL(k) grammars generate IE graphs (indexed edgeunambiguous) which can unambiguously represent 3D structures of heart muscle vascularisation visualised in images acquired during diagnostic examinations with the use of spiral computed tomography.

1 Introduction Coronary Heart Disease (CHD) is the leading cause of death in the industrialized world. Early diagnosis and risk assessment are widely accepted strategies to combat CHD [25]. Recent years have seen a rapid development of stereovision as well as algorithms for the 3D visualisation and reconstruction of 3D objects, which has become noticeable in modern medical diagnostics as well [5][7][17]. It has become possible not just to reliably map the structures of a given organ in 3D, but also to accurately observe its morphology. Such modern visualisation techniques are now used in practically all types of image diagnostics as well as in many other medical problems. Below, the author discusses the results of his research on the M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 89–111. © Springer-Verlag Berlin Heidelberg 2012 springerlink.com

90

M. Trzupek

opportunities for using selected artificial intelligence methods to semantically analyse selected medical images. In particular, he will present attempts at using linguistic methods of structural image analysis to develop systems for the cognitive analysis and understanding of medical images, and this will be illustrated by the recognition of lesions in coronary arteries of the heart. The development of such systems is aimed at supporting the early diagnostics of selected heart disorders using the automatic semantic interpretation of lesions. This goal can be achieved if the tools developed (e.g. using linguistic formalisms) allow the computer to penetrate the contents of the image, and not just its form. The image semantics produced by applying the above formalisms support a deeper reasoning about disease factors and the type of therapy. The problem undertaken is important because the identification and location of significant stenoses in coronary vessels is a widespread practical task. A broad range of opportunities for obtaining image data, e.g. by diagnosing the heart using computer tomography (CT), coronary artery angiography or intravascular ultrasonography (IVUS), as well as the 3D reconstruction of the examined coronary vessels produced by rendering provide huge amounts of information to the diagnostician [6][8][13][15] (fig. 1.).

Fig. 1 The 3D reconstruction of a beating heart produced using a Somatom Sensation Cardiac 64 spiral CT scanner [4]

Such a large volume of information is, on the one hand, useful as it allows the proposed right diagnosis to be formulated very competently, but on the other, too much information frequently makes it difficult to take an unambiguous decision, because a data overload can be as much of a problems as a deficit of it. This means that regardless of using state-of-the-art, expensive medical apparatuses, the images produced do not contribute to improving the accuracy of diagnoses to the extent expected when the decision was made to buy the apparatus. This paradoxical effect is mainly due to the fact that image data acquired using very advanced methods is usually assessed just visually and qualitatively by a physician. If the specialist is a doctor with high qualifications, this intelligent, although not completely formalised interpretation method can yield splendid results. However, if the image is to be assessed by someone with less experience and possibly poorer

Graph Image Language Techniques Supporting Advanced Classification

91

intuition, the diagnostic decision taken may be far from the best. The steps of the visual assessment of the analysed image are frequently preceded by some manual modifications of the image structure and its presentation method to improve the ability to visually assess the specific case. A large number of computer tools have been developed for this purpose and their operation is satisfactory [9][10][17]. However, there are no similar IT tools to be used for the automatic support of intellectual processes of physicians when they analyse an image and interpret it. Obviously this is not about eliminating humans from the decision-making process, as this diagnostic stage requires the in-depth consideration by a physician, during which he/she analyses the premises for the decision to be made, accounts for many facts that the computer could not grasp, and he/she is also legally and morally responsible for the diagnosis made and the treatment undertaken based on it. However, at the stage of collecting premises for taking the decision, doctors should make better use of the opportunities, offered by IT, to intelligently analyse complex image data and attempt to automatically classify, interpret and even understand specific images and their components using computer technology. So far, there are no such intelligent IT systems to support the cognitive processes of doctors who analyse complicated cases. However, it should be noted that bolder and more productive attempts at developing them are being made [1][10][12][20][23][24], including by the author. The reason for this is that there are still a number of unsolved scientific and technical problems encountered by designers of intelligent cognitive systems. What is more, these problems multiply greatly if we move from computer programs supporting the analysis of data provided by relatively simple measuring devices to second generation IT systems [6][8][15] coupled to medical diagnostic apparatuses, e.g. the very 3D images of coronary vascularisation considered here (fig. 2.).

Fig. 2 The operator’s panel of the computer system coupled to SOMATOM Sensation Cardiac 64 (source: [4])

92

M. Trzupek

2 The Classification Problem One of the main difficulties in developing universal, intelligent systems for medical image diagnostics is the huge variety of forms of images, both healthy and pathological, which have to be taken into account when supporting physicians that interpret them. For this reason, the analysis should be made independent from the orientation and location of the examined structure within the image. In addition, every healthy person has an individual structure of their internal organs, everyone is somewhat different, which prohibits us from unambiguously and strictly saying what a given organ should look like, as it may look different and still be healthy (i.e. fall within the so-called physiological forms). In particular, the aforementioned varied shapes of morphological elements make it difficult to set a universal standard defining the model shape of a healthy organ, or a pathological one. All of this means that attempts to effectively assess the morphology using computer software are very complicated and frequently outright impossible, because there are too many cases that would have to be analysed to unambiguously determine the condition of the structure being examined. Neither do classical methods of image recognition (i.e. simple classification) [9][18] produce satisfactory results – a comprehensive analysis, the complete recognition and interpretation of the disease symptoms looked for -- every time when supporting medical diagnostics (fig. 3.).

Fig. 3 A simple pattern classification methods rely strongly on quantitative measurements and are not well qualified for all problems

Consequently, it becomes necessary to introduce a somewhat more advanced reasoning aimed at recognising lesions and interpreting their meanings. The author’s research shows that mathematical linguistic formalisms, and in particular graph image grammars [12][23], can be quite successful in this area. However, using mathematical linguistic formalisms in the form of graph image grammars is not free of shortcomings, either. One of the main difficulties is the need to define the linguistic apparatus, i.e. develop a grammar so that there are deterministic syntax analysers for it which will allow the lesions looked for to be recognised [3][11][20][22]. Since, as a rule, it is very difficult to define some ideal, universal pattern, e.g. an image showing some model shape of a healthy or diseased organ, we are dealing with a situation in which we cannot define a complete and at the same time finite set containing all possible forms of the pathology that can occur. In addition, when there is a huge variety of shapes of the structures identified for the purpose of their proper recognition (classification), it may become necessary

Graph Image Language Techniques Supporting Advanced Classification

93

to define a grammar very broad in terms of the number of productions introduced. However, this problem can be solved by using grammars with greater generating capacities. Still, one has to remember that for some grammars of this type there may be problems with building deterministic syntax analysers. On the other hand, a computer using the well-known and frequently used techniques of automatic image recognition needs such a pattern to be provided to it. This is because the information technologies applied rely to a significant extent on intuition to determine the measure of similarity between the currently considered case and such an abstract pattern, so if the shapes of examined organs change unexpectedly as a result of a disease or individual differences, these technologies often fail. For this reason it is necessary to use advanced artificial intelligence techniques and computational intelligence techniques that can generalise the recorded image patterns. What is particularly important is to use intelligent description methods for medical images that would ignore the individual characteristics of the patient examined and the characteristics dependent on the specific form of the disease unit considered. Linguistic descriptions of this type, created using new image grammars modelling the shapes of healthy coronary vascularisation and the morphology of lesions, form the subject of the rest of this publication.

3 Stages in the Analysis of CT Images under a Structural Approach Utilising Graph Techniques As 3D reconstructions of the coronary vascularisation can come in many shapes and be presented in various projections (angles of observation), modelling such arteries requires using suitably advanced description and classification techniques. One of such techniques consists in image languages based on tree and graph formalisms [3][16][18]. Further down in this chapter, these very methods will be used to discuss the basic stages in the analysis and recognition (classification) of lesions in CT scans of coronary vascularisation. Under the syntactic (structural) approach [3][11][16][18][20][21][22], a complex image is treated as a hierarchical structure made up of simpler sub-images which can be broken down into even simpler ones until we get down to picture primitives. Then, depending on the relations between these primitives and using the appropriate formal grammar, the structure of the image can be represented as a series, a tree or a graph. Defining and identifying simple components of an image makes it possible to describe the shape of the analysed structure, and then to create a generalised, holistic description defining the shapes of lesions looked for in the item analysed. Thus, for the given image (showing a healthy structure or lesions), we obtain certain significant sequences describing the lesions of interest to us. Then, the process of recognising (classifying) the image boils down to a syntactic/semantic analysis (parsing) whose purpose is to identify whether the analysed input series is an element of the language generated by the grammar (fig. 4.).

94

M. Trzupek

Fig. 4 Diagram of a syntactic image recognition (classification) system

This publication presents the use of graph grammars, as they are a more robust tool for describing images than sequential or tree grammars. The basic assumption behind these methods is that it is possible to define a mechanism generating graph representations of the images considered. This mechanism is the appropriate graph grammar, whereas the set of all graph representations of images that it generates is treated as a certain language. We therefore have to build an automaton recognising elements of this language. This automaton, or more exactly, its software implementation – the syntactic analyser (parser) -- is responsible for the recognition procedure and allows the image description written using the proposed language to be converted into a description reaching down to the semantic sphere, enabling all material medical facts associated with the image examined to be understood. Creating a graph model of the 3D structure of the analysed vessels and its linguistic description makes it possible for the computer to analyse the structure obtained in order to automatically detect the location of the stenosis, its extent and type (concentric or eccentric). This representation yields a brief, unambiguous description of all elements of the vascular structure, thus supporting the further reasoning about its correct function or functional irregularities. In the future, it will be possible to combine this type of a description with the haemodynamic modelling of the flow in vessels affected by the disease, and this will help to link the lesions found in the morphology of coronary vessels with pathological blood supply to and the hypoxia of individual fragments of the heart muscle. In addition, using such semantic descriptions in integrated modules of intelligent medical diagnostics systems can help in the early detection of pathological stenoses leading to hearth hypoxia or anginas. The road to finding the appropriate languages necessary to describe the semantic part of 3D coronary vascularisation reconstructions is long and hard. The description of semantically important aspects of an image or its part cannot depend on details which are of secondary significance from the point of view of understanding the image contents and which thus produce additional information surplus which does not contribute anything to the final assessment of the given image. This is why, apart from developing the linguistic description methods for 3D images and from coming up with intelligent methods of using experts’

Graph Image Language Techniques Supporting Advanced Classification

95

knowledge, it makes sense to use pre-processing and analysis techniques of 3D medical images suitable for the specific nature of this problem. In the research work, attempts were made to find such methods of extracting and describing features of medical images that would ignore the individual features characteristic for the patient examined, but instead be geared towards extracting and correctly representing morphological features significant for understanding the pathology portrayed in the image. Only correctly identified elements of the image and their interrelations as well as the suitably selected components of the descriptions of these elements can form the basis for writing the linguistic description of the image, which would then, at the parsing stage, enable the semantics to be analysed and symptoms of the disease to be detected. The above elements of the 3D description, treated as letters of the alphabet (symbols) later used to build certain language formulas, must in particular be geared towards detecting lesions, thus allowing these lesions not only to be located, but also their essence to be interpreted and their medical significance defined.

4 Parsing Languages Generated by Graph Grammars The use of graph grammar to describe 2D or 3D images is mentioned in many scientific publications [10][16][18][20]. On the contrary, publications dealing with the syntactic analysis of 3D images are sparse. This is due to the computational complexity of the problem of parsing which for the overwhelming majority of graph grammar classes is an NP-complete problem [3][16][18]. As the methodology of recognising a specific type of images should be usable in practical applications, the grammar used for the description and then the recognition (classification) should ensure effective parsing. In this study, an ETPL(k) (Embedding Transformationpreserved Production-ordered k-Left nodes unambiguous) graph grammar has been proposed because this class offers a strong descriptive capacity and a known, effective parsing algorithm of a multinomial O(n2) complexity [3][16][18]. These grammars constitute a sub-class of edNLC (edge-labelled directed Node-Label Controlled) graph grammars, which represent a given image using EDG (Indexed Edge-unambiguous Graph) graphs [3][16][18]. ETPL(k) grammars generate IE graphs (indexed edge-unambiguous) – graphs with oriented and labelled edges as well as indexed vertices, allowing images to be unambiguously represented without deformations. However, distortions can sometimes occur at the image preprocessing state (e.g. picture primitives or their interrelations are incorrectly located), in consequence prohibiting the further analysis of such a case. This is because a standard parser treats such an image as one not belonging to the language generated by the given graph grammar. This problem can be solved by defining a certain probabilistic model for the recognised image using random IE graphs, as proposed in publication [16]. The most difficult job, particularly for graph grammars, is to design a suitable parser. In a structural analysis of graph representations, this parser automatically provides the complete information defining the 3D topology of the analysed graph. The difficulty in implementing a syntactic analyser stems from the lack of ready grammar compilers like those available for context-free grammars [20], and

96

M. Trzupek

this means that the syntactic analysis procedures have to be derived independently. For the ETPL(k) graph grammar presented here, an effective parsing algorithm of a multinomial complexity is known [3]. This allows us to develop very productive analysers that make it possible to verify the graph representations analysed to check if they constitute elements of the language defined by the graph grammar introduced. A significant benefit of using an ETPL(k) graph grammar is the possibility of introducing derivational rules with simple semantic actions. This makes it possible, in addition, to determine significant morphometric parameters of the analysed 3D reconstructions of coronary arteries. The entire syntactic/semantic analysis is carried out in a multinomial time, both for patterns unambiguously defined and for fuzzy, ambiguous patterns, as the above grammar classes can be extended into probabilistic forms [16]. This is a very desirable property, particularly when it is necessary to analyse cases not considered before.

5 Picture Grammars in Classification and Semantic Interpretation of 3D Coronary Vessels Visualisations 5.1 Characteristics of the Image Data Research work was conducted on images from diagnostic examinations made using 64-slice spiral computed tomography [4] in the form of animations saved as AVI (MPEG4) files with the 512x512 pixel format. Such sequences were obtained for various patients during diagnostic examinations of the heart and present in a very clear manner all morphologic changes of individual sections of arteries in any plane. Coronary vessels were visualized without the accompanying muscle tissue of the heart.

5.2 Preliminary Analysis of 3D Coronary Vascularisation Reconstructions To enable creating linguistic representations of 3D reconstructions of coronary vascularisation, images showing the coronary arteries being examined undergo a series of operations as part of the image pre-processing stage. The first step in the preliminary analysis of the images analysed is segmentation, which allows areas meeting certain homogeneity criteria (e.g. brightness, colour, texture) to be delimited, which usually boils down to distinguishing individual objects making up the image. In this case, segmentation consists in extracting coronary arteries while obscuring needless elements from the image background so that later it is possible to span the appropriate graph modelling the analysed structure. What is important is that this pre-processing stage is executed using dedicated software integrated with the CT scanner [4] and allows high quality images showing the coronary vascularisation of the examined patient, free from marginal elements, to be obtained. For this reason we can skip this pre-processing stage and focus straight away on

Graph Image Language Techniques Supporting Advanced Classification

97

the morphology of the examined arteries. This is an indisputable advantage of computed tomography over other diagnostic methods for acquiring these types of images, in which one cannot avoid using advanced techniques for segmenting the acquired images, and those may still contain fewer details and be of worse quality in the end. The heart vascularisation reconstructions analysed here were acquired using a SOMATOM CT scanner [4]. This apparatus offers a number of functionalities for data acquisition and creating 3D reconstructions on its basis. It also has predefined procedures built in, which allow vascularisation to be quickly extracted from the visible structures of the cardiac muscle. Since image data has been saved in the form of animations showing coronary vessels in various projections, for the further analysis we should select the appropriate projection which will show the examined coronary vessels in the most transparent form most convenient for describing and interpreting. In our research we have attempted to automate the procedure of finding such a projection by using selected geometric transformations during image processing. Using the fact that the spatial layout of an object can be determined by projecting it onto the axes of the Carthesian coordinate system, values of horizontal Feret diameters [19], which are a measure of the horizontal extent of the diagnosed coronary artery tree, are calculated for every subsequent animation frame during the image rotation (fig. 5.).

Fig. 5 The projection of the coronary arteries with the longest Feret diameter, obtained from an animation stored in the MPEG4 format

The projection for which the horizontal Feret diameter is the greatest is selected for further analyses, as this visualisation shows both the right and the left coronary artery in the most convenient take. In a small number of analysed images, regardless of selecting the projection with the longest horizontal Feret diameter, vessels may obscure one another in space, which causes a problem at subsequent stages of the analysis. The best method to avoid this would be to use advanced techniques

98

M. Trzupek

for determining mutually corresponding elements for every subsequent animation frame based on the geometric relations in 3D space.

5.3 Graph-Based Linguistic Formalisms in Spatial Modelling of Coronary Vessels As the structure of coronary vessels is characterized by three basic types of artery distribution on the heart surface, the proposed methods should include three basic cases: balanced artery distribution, right artery dominant and left artery dominant. For the purposes of this article, in further considerations we will focus on the balanced distribution of coronary arteries which is the most frequent type seen by diagnosticians (60-70% of all cases) [2]. To help represent the examined structure of coronary vascularisation with a graph, it is necessary to define primary components of the analyzed image and their spatial relations, which will serve to extract and suitably represent the morphological characteristics significant for understanding the pathology shown in the image. It is therefore necessary to identify individual coronary arteries and their mutual spatial relations. To ease this process, the projection selected for analyzing was skeletonised. This made it possible to obtain

Fig. 6 Coronary vascularisation projection and its skeleton produced using the Pavlidis skeletonising algorithm

Graph Image Language Techniques Supporting Advanced Classification

99

the centre lines of examined arteries. These lines are equidistant from their external edges and one unit wide (fig. 6.). This gives us the skeleton of the given artery which is much thinner than the artery itself, but fully reflects its topological structure. Of several skeletonising algorithms used to analyse medical images, the Pavlidis skeletonising algorithm [14] turned out to be one of the best. It facilitates generating regular, continuous skeletons with a central location and one unit width. It also leaves the fewest apparent side branches in the skeleton and the lines generated during the analysis are only negligibly shortened at their ends. Skeletonising is aimed only at making it possible to find branching points in the vascularisation structures and then to introduce an unambiguous linguistic description of individual coronary arteries and their branches. Lesions will be detected in a representation defined in this way, even though their morphometric parameters have to be determined based on a pattern showing the appropriate vessel, and not just its skeleton. The centre lines of analyzed arteries produced by skeletonising them is then searched for informative points, i.e. points where artery sections intersect or end. These points will constitute the vertices of a graph modelling the spatial structure of the coronary vessels of the heart. The next step is labelling them by giving each located informative point the appropriate label from the set of vertex labels. In the case of terminal points, the set of vertex labels comprises abbreviated names of arteries found in coronary vascularisation. They have been defined as in the table 1. Table 1 The set of vertex labels

For the left coronary artery LCA - left coronary artery LAD - anterior interventricular branch (left anterior descending) CX - circumflex branch L - lateral branch LM - left marginal branch

For the right coronary artery RCA - right coronary artery RM - right marginal branch PI - posterior interventricular branch RP - right posterolateral branch

If a given informative point is a branching point, then the vertex will be labelled with the concatenation of names of the vertex labels of arteries which begin at this point. This way, all initial and final points of coronary vessels as well as all points where main vessels branch into lower level vessels have been determined and labelled as appropriate. After this operation, the coronary vascularisation tree is divided into sections which constitute the edges of a graph modelling the examined coronary arteries. This makes it possible to formulate a description in the form of edge labels which determine the mutual spatial relations between the primary components, i.e. between subsequent arteries shown in the analysed image. These labels have been identified according to the following system. Mutual spatial relations that may occur between elements of the vascular structure represented by a graph are described by the set of edges. The elements of this set have

100

M. Trzupek

been defined by introducing the appropriate spatial relations: vertical - defined by the set of labels {α, β,…, μ} and horizontal - defined by the set of labels {1, 2,…, 24} on a hypothetical sphere surrounding the heart muscle. These labels designate individual final intervals, each of which has the angular spread of 15°. Then, depending on the location, terminal edge labels are assigned to all branches identified by the beginnings and ends of the appropriate sections of coronary arteries. The presented methodology draws upon the method of determining the location of a point on the surface of our planet in the system of geographic coordinates, where a similar cartographic projection is used to make topographic maps. The use of the presented methodology to determine spatial relations for the analysed projection is shown below (fig. 7.).

Fig. 7 Procedure of identifying spatial relations between individual coronary arteries

To determine the appropriate label for the vector W, its beginning should be placed at the zero point of the coordinate system, and then its terminal point location should be established. For this purpose, two angles have been defined: the azimuth angle A to identify the location of the given point as a result of rotating around the vertical axis and the elevation angle E which identifies the elevation of a given point above the horizon. This representation of mutual spatial relations between the analysed arteries yields a convenient access to the unambiguous description of all elements of the vascular structure. At subsequent analysis stages, this description will be correctly formalised using ETPL(k) graph grammars defined in [3][16][18], supporting the search for stenoses in the lumen of arteries forming parts of the coronary vascularisation. ETPL(k) grammars generate a language L(G) in the form of IE graphs which can unambiguously represent 3D structures of heart muscle vascularisation visualised in images acquired during diagnostic examinations with the use of spiral computed tomography. Quoted below is the formal definition of the IE graph [3][16][18]. H=(V, E, Σ, Γ, Φ)

Graph Image Language Techniques Supporting Advanced Classification

101

where: V is a finite, non-empty set of graph nodes with unambiguously assigned indices Σ is a finite, non-empty set of node labels Γ is a finite, non-empty set of edge labels E is a set of graph edges in the form of (v, λ,w), where v, w∈V, λ∈Γ and the index v is smaller than the index w ϕ:V→Σ is a function of node labelling Before we define the representation of the analysed image in the form of IE graphs, we have to introduce the following order relationship in the set of Г edge labels: 1 ≤ 2 ≤ 3 ≤ … ≤ 24 and α ≤ β ≤ γ ≤ … ≤ μ. This way, we index all vertices according to the ≤ relationship in the set of edge labels which connect the main vertex marked 1 to the adjacent vertices and we index in the ascending order (i = 2, 3, …, n). After this operation every vertex of the graph is unambiguously assigned the appropriate index which will later be used when syntactically analysing the graph representations examined. IE graphs generated using the presented methodology, modelling the analysed coronary vascularisation with their characteristic descriptions (in tables), are presented in the figure below (fig. 8.).

Fig. 8 The representation of the right and the left coronary artery using IE graphs

102

M. Trzupek

The graph structure created in this way will form elements of a graph language defining the spatial topology of the heart muscle vascularisation including its possible morphological changes. Formulating a linguistic description for the purpose of determining the semantics of the lesions searched for and identifying (locating) pathological stenoses will support the computer analysis of the structure obtained in order to automatically detect the number of stenoses, their location, type (concentric or eccentric) and extent. For IE graphs defined as above, in order to locate the place where stenoses occur in the case of a balanced artery distribution, the graph grammar may take the following form: a)

for the right coronary artery: GR=(Σ, Δ, Γ, P, Z)

Σ = {ST, RCA, RM, RP_PI, RP, PI, C_Right, C_Right_post_int} is a finite, nonempty set of node labels Δ = {ST, RCA, RM, RP_PI, RP, PI} is a set of terminal node labels Γ = {15ι, 6ο, 10ο, 4λ, 10π, 12ξ} is a finite, non-empty set of edge labels Start graph Z and set of productions P are shown in fig. 9.

Fig. 9 Start graph Z and set of productions for grammar GR

b) for the left coronary artery: GL=(Σ, Δ, Γ, P, Z) Σ = {ST, LCA, L_LAD, CX, L, LAD, C_Left, C_Left_lad_lat} Δ = {ST, LCA, L_LAD, CX, L, LAD} Γ = {7κ, 2ο, 12ο, 14ξ, 12μ, 17μ, 15ν}

Graph Image Language Techniques Supporting Advanced Classification

103

Start graph Z and set of productions P are shown in fig. 10.

Fig. 10 Start graph Z and set of productions for grammar GL

This way, we have defined a mechanism in the form of ETPL(k) graph grammars which create a certain linguistic representation of each analysed image in the form of IE graphs. The set of all representations of images generated by this grammar is treated as a certain language. Consequently, we can build a syntax analyser based on the proposed graph grammar which will recognise elements of this language. The syntax analyser is the proper programme that will recognise the changes looked for in the lumen of coronary arteries.

5.4 Detecting Lesions and Constructing the Syntactic Analyser The problem of representing the analysed image using graph structures, presented in the previous subsection, which is of key importance for syntactic methods, constitutes the preliminary stage of the recognition process. Defining the appropriate mechanism in the form of a graph grammar G yielded IE graph representations for the analysed images, and the set of all image representations generated by this grammar is treated as a certain language. The most important stage which corresponds to the recognition procedure for the pathologies found is the implementation of a syntactic analyser which would allow the analysis to be carried out using cognitive resonance [20], which is key to understanding the image. This is the hardest part of the entire recognition process, in particular for grammars which describe the set of rules of a language using graph grammars. One of the procedures of algorithms for parsing IE graphs for ETPL(k) graph grammars is the comparison of descriptions of characteristic subsequent vertices of the analysed IE graph

104

M. Trzupek

and the derived IE graph which is to lead to generating the analysed graph. A onepass generation-type parser carries out the syntactic analysis, at every step examining the characteristic description of the given vertex. If the vertex of the analysed and the derived graphs is terminal, then their characteristic descriptions are examined. However, if the vertex of the derived graph is non-terminal, then a production is searched for, after the application of which the characteristic descriptions are consistent. The numbers of productions used during the parsing form the basis for classifying the recognised structure. This methodology makes use of theoretical aspects of conducting the syntactic analysis for ETPL(k) grammars, described in [3][16][18]. Due to the fact that in visualisations of coronary vascularisation, we can distinguish three different types of topologies, characteristic for these vessels, therefore, for each of the three types of topology, we can propose appropriate type of ETPL(k) graph grammar. Each grammar generates IE graphs language, modelling particular types of coronary vascularisation. This representation was then subjected to a detailed analysis, to find the places of morphological changes indicating occurrence of pathology. This operation consists of several stages, and uses, among others context-free sequential grammars [11][20]. Next steps in the analysis on the example of coronary artery are shown in fig. 11. Arteries with vertices ST1 – RCA2 and L_LAD3 – LAD6 represented by the edges 15ι and 17μ of the IE graph have been subjected to the operation of the straightening transformation [11], which allows to obtain the width diagrams of the analyzed arteries, while preserving all their properties, including potential changes in morphology. In addition, such representation allows to determine the nature of the narrowing (concentric or eccentric). Concentric stenoses appear on a cross-section as a uniform stricture of the whole artery and present symptoms characteristic for a stable disturbance of heart rhythm, whereas eccentric stenoses occur only on one vascular wall, and characterize an unstable angina pectoris [2]. Analysis of morphological changes was conducting based on the obtained width diagrams and using context-free attributed grammars [11][20]. As a result of carried out operations profiles of the analyzed coronary arteries with marked areas of existing pathologies, together with the determination of the numerical values of their advancement level were obtained (fig. 11.). Methodology presented above was implemented sequentially to the individual sections of coronary vascularisation represented by the particular edges of the introduced graph representation.

5.5 Selected Results In order to determine the operating efficiency of the proposed methods, a set of test data, namely visualisations obtained during diagnostic examinations using 64slice spiral computed tomography was used. This set consisted of 20 complete reconstructions of coronary vascularisation (table 2.) obtained during diagnostic examinations of various patients, mainly suffering from coronary heart disease at different progression stages. The test data also consisted of visualisations previously used to construct the grammar and the syntactic analyser. However, to avoid analysing identical images, from the same sequences frames were selected that

Graph Image Language Techniques Supporting Advanced Classification

105

Fig. 11 Next steps in the analysis and recognition of morphological changes occurring on the example of the right (a) and the left (b) coronary artery

106

M. Trzupek

were several frames later than the projections used to construct the set of grammatical rules, and these later frames were used for the analysis. Due to the different types of topologies of coronary vascularisation, the set of analyzed images was as follows: Table 2 Number of images with particular coronary arteries topologies

Balanced artery distribution 9

Right artery dominant 6

Left artery dominant 5

Structure of the coronary vascularisation was determined by a diagnostician at the stage of acquisition of image data. This distinction was intended to obtain additional information about the importance of providing health risks of the patient depending on the place where pathology occurs and the type of coronary vascularisation (e.g. stenosis occurring in the left coronary artery will constitute a greater threat to the health of patients having a left artery dominant structure in comparison to the patients having a right artery dominant structure). The above set of image data was used to determine the percentage efficiency of correct recognitions of the stenoses present, using the methodology proposed here. The recognition consists in identifying the locations of stenoses, their number, extent and type (concentric or eccentric). For the research data included in the experiment, 85% of recognitions were correct. This value is the percentage proportion of the number of images in which the occurring stenoses were correctly located, measured and properly interpreted to the number of all analysed images included in the experimental data set. No indication of major differences in the effectiveness evaluation, depending on the structure of the coronary vascularisation are noticed. The following figure 12 shows the image of CT coronary vascularisation, together with a record describing the pathological changes occurring in the right and left coronary arteries. In order to assess whether the size of the stenosis was correctly measured, we used comparative values from the syngo Vessel View software forming part of the HeartView CI suite [4]. This programme is used in everyday clinical practice where examinations are made with the SOMATOM Sensation Cardiac 64 tomograph [4]. In order to confirm or reject the regularity of the stenosis type determination (concentric or eccentric) shown in the examined image, we decided to use a visual assessment, because the aforementioned programs did not have this functionality implemented. As the set of test data was small - a dozen or so elements the results obtained are very promising, and this effectiveness is due, among other things, to the strong generalising properties of the algorithms applied. Further research on improving the presented analysis techniques of lesions occurring in the morphology of coronary vessels might bring about a further improvement in the effectiveness and the future standardisation of these methods, obviously after they have first been tested on a much more numerous image data set.

Graph Image Language Techniques Supporting Advanced Classification

107

Fig. 12 The result of the analysis in the search for pathological changes occurring in the CT image of coronary arteries

Results obtained in the research conducted show that graph languages for describing shape features can be effectively used to describe 3D reconstructions of coronary vessels and also to formulate semantic meaning descriptions of lesions found in these reconstructions. Such formalisms, due to their significant descriptive power (characteristic especially for graph grammars) can create models of both examined vessels whose morphology shows no lesions and those with visible lesions bearing witness to early or more advanced stages of the ischemic heart disease. By introducing the appropriate spatial relations into the coronary vessel reconstruction, it is possible to reproduce their biological role, namely the blood distribution within the whole coronary circulation system, which also facilitates locating and determining the progression stage of lesions. All of this makes up a process of automatically understanding the examined 3D structure, which allows us to provide the physician with far more and far more valuable premises for

108

M. Trzupek

his/her therapeutic decisions than we could if we were using the traditional image recognition paradigm.

6 Conclusions Concerning the Advanced Classification and Cognitive Interpretation of CT Coronary Vessel Visualizations The research so far has shown that one of the hardest tasks leading to the computer classification and then the semantic interpretation of medical visualisations is to create suitable representations of the analysed structures and propose effective algorithms for reasoning about the nature of lesions found in these images. Visualisations of coronary vascularisation are difficult for computers to analyse due to the variety of projections of the arteries examined. The methods developed made use of ETPL(k) graph grammars and the IE graphs they generate as formalisms modelling coronary vascularisation structures. Such solutions make it possible to detect all symptoms important from the point of view of diagnostics, appearing as various types of stenoses of the coronary arteries of the heart. An important stage in achieving this goal is to construct the right syntactic/semantic analysis procedure (i.e. a parser) which, when analysing graph representations created for 3D reconstructions of coronary vascularisation, automatically provides complete information, defining the 3D topology of the analysed graph that describes the coronary vessels including their individual components. Difficulties in implementing a syntactic analyser are due to the lack of ready (i.e. software) grammar compilers like those available for context-free grammars, for instance [11][20]. This means that syntactic analysis procedures have to be derived independently for the proposed grammars. A significant benefit of using ETPL(k) graph grammars is the possibility of extending them to the form of probabilistic grammars and the ability to introduce derivational rules with simple semantic actions [11][16]. This makes it possible, in addition, to determine significant morphometric parameters of the analysed spatial reconstruction of coronary vessels. Carrying out semantic actions assigned to individual productions generates certain values or information which results from the syntactic analysis completed. In the case of analyses of 3D coronary vascularisation reconstructions, semantic actions of some productions will be aimed at determining numerical values defining the location and degree of the stenosis as well as its type (concentric or eccentric). These parameters will then be utilised as additional information useful in recognising doubtful or ambiguous cases or symptoms. Even though every analysed image has a different representation describing its morphology, a syntactic analysis is conducted based on defined grammars and this reduces the number of potential cases to the recognition and inclusion of the examined structure in one of the classes representing individual categories of disease units. If the completed analysis does not end in recognising any pathological symptom, then the analysing system generates information on the output that the analysed structure is a healthy one. If symptoms which are ambiguous from the medical point of view are

Graph Image Language Techniques Supporting Advanced Classification

109

recognised, the final diagnosis can only be made by applying additional analysing algorithms. The development of new methods of computer analysis and detection of stenoses inside coronary arteries helps not only to significantly improve diagnostic actions, but also greatly broadens the application spectrum of artificial intelligence in computer understanding of diagnostic images and determining the medical significance of pathologies shown in them. The linguistic formalisms developed also add new types of grammars and their applications to the fields of artificial intelligence and image recognition. Such techniques are of major importance as they allow lesions not just to be recognised, but also their semantics defined, which in the case of medical diagnostic images can lead to the computer understanding their significance. This is of key importance for detailing the best therapeutic possibilities and if the proposed methods are perfected, they can significantly improve the ability to support the early recognition and diagnostics of heart lesions. This is of practical significance as the identification of locations of stenoses in coronary vessels is performed very widely, but manually, by an operator or a diagnostician, and as the research has shown, such key stages of the diagnostic process based on the analysis of 3D images can, in the future, be successfully executed by the appropriately designed computer system. It is also worth noting that the methods presented in this publication are not just an attempt at assigning the examined image to an a’priori defined class, but are also an attempt to imitate and automate the human process of the medical understanding of the significance of a shape found in the analysed image. This approach allows numerous medical conclusions to be drawn from the examined image, which, in particular, can lead to making the right diagnosis and recommending a specific type of therapy depending on the shape and location of the pathological stenosis described in the sets of productions. There is a deep analogy between the operation of the structural analysis model and the cognitive interpreting mechanisms occurring in the human mind. The analogy consists in using the interference between the expectations (knowledge collected in the set of productions in the form of graph grammatical rules) and the stream of data coming from the system analysing the examined image. This interference is characteristic for the human visual perception model [20]. Problems related to automating the process of generating new grammars for cases not included in the present language remain unsolved in the on-going research. However, it is worth noting that generally, the problem of deriving grammatical rules is considered unsolvable, particularly for graph grammars. It can appear if the image undergoing the analysis shows a coronary vascularisation structure different from the so far assumed three cases of vessel topologies occurring the most often, i.e. the balanced distribution of arteries, the dominant right artery or the dominant left artery. In those cases it will be necessary to define a grammar taking this new case into account. The processes of creating new grammars and enriching existing ones with new description rules will be followed in further directions of research on the presented methods. Another planned element of further research is to focus on using linguistic artificial intelligence method to create additional, effective mechanisms which can be used for indexing and quickly finding specialised image data in medical databases. Such searches

110

M. Trzupek

will use semantic keys and will allow finding cases meeting specified substantive conditions related to image contents. This can significantly contribute to solving at least some of the problems of intelligently archiving this type of data and finding semantic image data fulfilling semantic criteria set using example image patterns from medical multimedia databases. Acknowledgments. This work has been supported by the National Science Centre, Republic of Poland, under project number N N516 478940.

References [1] Breeuwer, M., Johnson, P., Kouwenhoven, K.: Analysis of volumetric cardiac CT and MR image data. MEDICAMundi 47(2), 41–53 (2003) [2] Faergeman, O.: Coronary Artery Disease. Elsevier Science B.V (2003) [3] Flasiński, M.: On the parsing of deterministic graph languages for syntactic pattern recognition. Pattern Recognition 26, 1–16 (1993) [4] SOMATOM Sensation Cardiac 64 Brochure, Get the Entire Picture. Siemens medical (2004) [5] Higgins, W.E., Reinhardt, J.M.: Cardiac image processing. In: Bovik, A. (ed.) Handbook of Video and Image Processing, pp. 789–804. Academic Press (2000) [6] Katritsis, D.G., Pantos, I., Efstathopoulos, E.P., et al.: Three-dimensional analysis of the left anterior descending coronary artery: comparison with conventional coronary angiograms. Coronary Artery Disease 19(4), 265–270 (2008) [7] Lewandowski, P., Tomczyk, A., Szczepaniak, P.S.: Visualization of 3-D Objects in Medicine - Selected Technical Aspects for Physicians. Journal of Medical Informatics and Technologies 11, 59–67 (2007) [8] Meijboom, B.W., Van Mieghem, C.A., Van Pelt, N., et al.: Comprehensive Assessment of Coronary Artery Stenoses: Computed Tomography Coronary Angiography Versus Conventional Coronary Angiography and Correlation With Fractional Flow Reserve in Patients With Stable Angina. Journal of the American College of Cardiology 52(8), 636–643 (2008) [9] Meyer-Baese, A.: Pattern Recognition in Medical Imaging. Elsevier-Academic Press (2003) [10] Ogiela, M.R., Tadeusiewicz, R.: Modern Computational Intelligence Methods for the Interpretation of Medical Images. Springer, Heidelberg (2008) [11] Ogiela, M.R., Tadeusiewicz, R.: Syntactic reasoning and pattern recognition for analysis of coronary artery images. Artificial Intelligence in Medicine 26, 145–159 (2002) [12] Ogiela, M.R., Tadeusiewicz, R., Trzupek, M.: Picture grammars in classification and semantic interpretation of 3D coronary vessels visualisations. Opto-Electronics Review 17(3), 200–210 (2009) [13] Oncel, D., Oncel, G., Tastan, A., Tamci, B.: Detection of significant coronary artery stenosis with 64-section MDCT angiography. European Journal of Radiology 62(3), 394–405 (2007) [14] Pavlidis, T.: Algorithms for graphics and image processing. Rockville Computer Science Press (1982)

Graph Image Language Techniques Supporting Advanced Classification

111

[15] Sirol, M., Sanz, J., Henry, P., et al.: Evaluation of 64-slice MDCT in the real world of cardiology: A comparison with conventional coronary angiography. Archives of Cardiovascular Diseases 102(5), 433–439 (2009) [16] Skomorowski, M.: A Syntactic-Statistical Approach to Recognition of Distorted Patterns. Jagiellonian University Krakow (2000) [17] Sonka, M., Fitzpatrick, J.M.: Handbook of Medical Imaging, vol. 2. Medical Image Processing and Analysis SPIE, Belligham, Washington (2004) [18] Tadeusiewicz, R., Flasiński, M.: Pattern Recognition. PWN Warsaw (1991) (in Polish) [19] Tadeusiewicz, R., Korohoda, P.: Computer Analysis and Image Processing. Foundation of Progress in Telecommunication Kraków (1997) (in Polish) [20] Tadeusiewicz, R., Ogiela, M.R.: Medical Image Understanding Technology. Springer, Heidelberg (2004) [21] Tadeusiewicz, R., Ogiela, M.R.: Structural Approach to Medical Image Understanding. Bulletin of the Polish Academy of Sciences – Technical Sciences 52(2), 131–139 (2004) [22] Tanaka, E.: Theoretical aspects of syntactic pattern recognition. Pattern Recognition 28, 1053–1061 (1995) [23] Trzupek, M., Ogiela, M.R., Tadeusiewicz, R.: Image content analysis for cardiac 3D visualizations. In: Velásquez, J.D., Ríos, S.A., Howlett, R.J., Jain, L.C. (eds.) KES 2009. LNCS (LNAI), vol. 5711, pp. 192–199. Springer, Heidelberg (2009) [24] Wang, Y., Liatsis, P.: A Fully Automated Framework for Segmentation and Stenosis Quantification of Coronary Arteries in 3D CTA Imaging. In: Dese 2009 Second International Conference on Developments in eSystems Engineering, pp. 136–140 (2009) [25] Yusuf, S., Reddy, S., Ounpuu, S., Anand, S.: Global burden of cardiovascular diseases, Part I. General Considerations, the Epidemiologic Transition, Risk Factors, and Impact of Urbanization. Circulation 104, 2746–2753 (2001)

Chapter 7

A Graph Matching Approach to Symmetry Detection and Analysis Michael Chertok and Yosi Keller Bar-Ilan University, Israel {michael.chertok,yosi.keller}@gmail.com

Abstract. Spectral relaxation was shown to provide an efficient approach for solving a gamut of computational problems, ranging from data mining to image registration. In this chapter we show that in the context of graph matching, spectral relaxation can be applied to the detection and analysis of symmetries in n-dimensions. First, we cast symmetry detection of a set of points in Rn as the self-alignment of the set to itself. Thus, by representing an object by a set of points S ∈ Rn , symmetry is manifested by multiple self-alignments. Secondly, we formulate the alignment problem as a quadratic binary optimization problem, solved efficiently via spectral relaxation. Thus, each eigenvalue corresponds to a potential self-alignment, and eigenvalues with multiplicity greater than one correspond to symmetric selfalignments. The corresponding eigenvectors reveal the point alignment and pave the way for further analysis of the recovered symmetry. We apply our approach to image analysis, by using local features to represent each image as a set of points. Last, we improve the scheme’s robustness by inducing geometrical constraints on the spectral analysis results. Our approach is verified by extensive experiments and was applied to two and three dimensional synthetic and real life images.

1 Introduction Symmetry is all around us. In nature it is commonly seen in living creatures such as butterflies, in still life like flowers, and in the unseen world of molecules. Humans have been inspired by nature’s symmetry for thousand of years in countless fields. In the fields of art and architecture, the scientific fields of mathematics and even the humanitarian fields such as philosophy - symmetry is a prominent factor. From airplanes to kitchen cups, symmetry ideas are copied from nature in many manmade objects. The presence of visual symmetry in everyday life makes its detection and analysis one of the fundamental tasks in computer vision. When dealing with computer vision applications, the detection of symmetry is mainly used to reach a more advanced level of recognition, like object detection or segmentation. It is known that the human visual system is more prone to detect symmetric patterns in M.R. Ogiela and L.C. Jain (Eds.): Computational Intelligence Paradigms, SCI 386, pp. 113–144. c Springer-Verlag Berlin Heidelberg 2012 springerlink.com 

114

M. Chertok and Y. Keller

a given scenario than any other pattern. A person will first focus his attention on an object with symmetry than other objects in a picture [EWC00]. Rotational and reflectional symmetries are the most common types of symmetries. An object is said to have rotational symmetry of order K if it is invariant under rotations of 2Kπ k, k = 0 . . . K − 1 about a point denoted the symmetry center. Whereas an object has reflectional symmetry if it is invariant under a reflection transformation about a line, denoted the reflection axis. Figure 1 presents both types of symmetry.

(a)

(b)

Fig. 1 Rotational and reflectional symmetries. (a) Rotational symmetry of order eight without reflectional symmetry. (b) Reflectional symmetry of order one.

The problem of symmetry detection and analysis has been studied by many researches [CPML07, DG04, KCD+ 02, KG98, LE06, Luc04, RWY95, SIT01, SNP05]. Most of them deal with two-dimensional symmetries, while a few analyze three-dimensional data [KCD+ 02]. A recent survey [PLC+ 08] by Chen et al. found that despite the significant research effort made, there is still a need for a robust, widely applicable “symmetry detector”. We propose an effective scheme for detection and analysis of rotational and reflectional symmetries in n-dimensions. The scheme is based on the self-alignment of points using a spectral relaxation technique as proposed by Leordeanu in [LH05]. Our core contribution is to show that the symmetry of a sets of points S ∈ Rn is manifested by a multiplicity of the leading eigenvalues and corresponding eigenvectors. This leads to a purely geometric symmetry detection approach that only utilizes the coordinates of the set S, and can thus be applied to abstract sets of points in Rn . Given the eigendecomposition, we show how to recover the intrinsic properties of reflectional and rotational symmetries (center of rotation, point correspondences, symmetry axes). In our second contribution, we derive a geometrical pruning measure, by representing the alignments as geometrical transforms in Rn and enforcing a norm constraint. This allows us to reject erroneous matchings, and analyze real data, where symmetries are often embedded in clutter. In our third contribution, we

A Graph Matching Approach to Symmetry Detection and Analysis

115

analyze the case of perfect symmetry and show it results in a degenerate eigendecomposition. We then resolve this issue and explain why perfect symmetry rarely appears in real data such as images. The proposed scheme requires no apriori knowledge of the type of symmetry (reflection/rotation) and holds for both. Our scheme, denoted as Spectral Symmetry Analysis (SSA), can detect partial symmetry and is robust to outliers. In our last contribution we apply the SSA to the analysis of symmetry in images, for which we utilize local image features [Low03, SM97] to represent images as sets of points. For that we utilize image descriptors to reduce the computational complexity. The chapter is organized as follows: we start by presenting the geometrical properties of symmetries in Section 2 and then survey previous results on symmetry detection, local features and spectral alignment in Section 3. Our approach to symmetry analysis is presented in Section 4 and experimentally verified in Section 5. Concluding remarks are given in Section 6.

2 Symmetries and Their Properties In this work we study the symmetry properties of sets of points S = {xi }, such that xi ∈ Rn . The common types of symmetries are the rotational and reflectional symmetries. Sets having only the first type are described by the cyclic group CK , while others have both rotational and reflectional symmetry, and are described by the dihedral group DK , where K is an order of the respective symmetry. In this section we define the rotational (cyclic) and reflectional (dihedral) symmetries,

C D denoted CK and DK , respectively. By considering the subsets of points SI , SI ⊂ S that are invariant under the corresponding symmetry transforms TCK and TDK , we are able to recover the rotation centers and reflection axes. These invariant sets are shown to be related to the spectral properties of TCK and TDK . Finally, we derive an analytical relationship between TCK and TDK in a two-dimensional case, which allows us to infer the rotational symmetry transform TCK , given two reflectional transforms TDK .

2.1 Rotational Symmetry Definition 1 (Rotational symmetry). A set S ∈ Rn is rotationally symmetric with a rotational symmetry transform TCK , of order K if ∀xi ∈ S, ∃ x j ∈ S, s.t. x j = TCK xi .

(1)

For S ∈ R2 , TCK is given by ⎞⎛ ⎞ x cos βk − sin βk 0 TCK (x, y) = ⎝ sin βk cos βk 0 ⎠ ⎝ y ⎠ , 0 0 1 1 ⎛

(2)

116

M. Chertok and Y. Keller

where βk = 2Kπ k, k = 0, . . . , K − 1. Thus, for a symmetry of order K, there exists a

K set of K transformations Rβk 1 for which Eq. 1 holds. The use of homogeneous coordinates in Eq. 2 allows us to handle non-centered symmetries. An example of rotational symmetry is given in Fig. 1a. Equation 2 implies that det(TCK ) = 1

(3)

where the invariant set SCI , is the center of rotational symmetry. Given a rotation operator TCK , the center of rotation (that is also the center of rotational symmetry) Xc , is invariant under TCK , and can be computed as the eigenvector of TCK corresponding to the eigenvalue λ = 1 TCK Xc = Xc .

(4)

2.2 Reflectional Symmetry Definition 2 (Reflectional symmetry). A set S ∈ Rn is reflectionally symmetric with respect to the vector (reflection axis) cos α0 , sin α0  with a reflectional transform TDK , if ∀xi ∈ S, ∃ x j ∈ S, s.t. (5) x j = TDK xi . where for xi ∈ R2 , TDK is given by ⎞⎛ ⎞ ⎛ x cos 2α0 sin 2α0 0 TDK (x, y) = ⎝ sin 2α0 − cos2α0 0 ⎠ ⎝ y ⎠ . 0 0 1 1

(6)

A set S has reflectional symmetry of order K, if there are K angles αk that satisfy Eq. 5. An example of reflectional symmetry is given in Fig. 1b. where α0 is the angle of the reflection axis, and Eq. 6 implies that det (TDK ) = −1.

(7)

Similar to the rotational symmetry case, the points on the symmetry axis form an invariant set XR that corresponds to the eigenspace of TDK XR = XR .

(8)

Conversely to Eq. 4, the eigenspace corresponding to λ = 1 is of rank 2, in accordance to SID being a line.

A Graph Matching Approach to Symmetry Detection and Analysis

117

2.3 Interrelations between Rotational and Reflectional Symmetries Theorem 1. If a set S has rotational symmetry of order K, then it either has reflectional symmetry of order K or has no reflectional symmetry at all [Cox69, Wey52]. If S has both rotational and reflectional symmetry then the axes of reflectional symmetry are given by 1 αn = α0 + βk , 2

k = 0, . . . , K − 1,

(9)

where α0 is the angle of one of the reflection axes, and βk are the angles of rotational symmetry. Theorem 2. Given two distinct reflectional transforms TD1 and TD2 , one can recover the corresponding symmetrical rotational transforms TCK TCK = TD1 · TD2

(10)

The proof is given In Appendix A.

2.4 Discussion The geometrical properties presented above, pave the way for a computational scheme for the analysis of a given set of prospective symmetry transforms {Ti }. By imposing Eqs. 3 and 7, erroneous transforms can be discarded, and the spectral analysis, given in Eqs. 4 and 8 can be used to recover the center of rotational symmetry and the axis of the reflectional one. Moreover, based on Theorem 2, one can start with a pair of reflection transforms {TD1 , TD2 } and recover the rotational transform TCK . Note that in practice, the prospective transforms {Ti }, are computed by the spectral matching algorithm discussed in Section 4. That procedure provides an alternative geometrical approach to recovering the symmetry centers and axes. Using both methods, we are able to cross-validate the results. The norm tests of the symmetrical transforms (Eqs. 3 and 7), can be applied to higher dimensional data sets, as well as the spectral analysis in Eqs. 4 and 8. The equivalent in three-dimensional data is given by Euler’s theorem [TV98].

3 Previous Work This section overviews previous work related to our scheme. Section 3.1 provides a survey of recent results in symmetry analysis, while Section 3.2 presents the notion of Local Features that is used to represent an image as a set of salient points. A combinatorial formulation of the alignment of sets of points in Rn is discussed in Section 3.3, and a computationally efficient solution, via spectral relaxation is

118

M. Chertok and Y. Keller

presented. The latter approach paves the way for the proposed Spectral Symmetry Analysis (SSA) algorithm presented in Section 4.

3.1 Previous Work in Symmetry Detection and Analysis Symmetry has been thoroughly studied in literature from theoretical, algorithmic, and applicative perspectives. Theoretical analysis of symmetry can be found in [Mil72, Wey52]. The algorithmic approaches to its detection can be divided into several categories, the first of which consists of intensity-based schemes that compute numerical moments of image patches. For instance, detection of vertical reflectional symmetry using a one-dimensional odd-even decomposition is presented in [Che01]. The authors assume that the symmetry axis is vertical and thus scans each horizontal line in the image. Each such line is treated as a one-dimensional signal that is normalized and decomposed into odd and even parts. From odd and even parts, the algorithm constructs a target function that achieves its maximum at the point of mirror symmetry of the one-dimensional signal. When the image has a vertical symmetry axis, all symmetry points of the different horizontal lines lie along a vertical line in the image. A method that estimates the relative rotation of two patterns using the Zernike moments is suggested in [KK99]. This problem is closely related to the problem of detecting rotational symmetry in images. Given two patterns, where one pattern is a rotated replica of the other pattern, the Zernike moments of the two images will have the same magnitude and some phase differential. The phase differential can be used to estimate the relative rotation of the two images. In order to detect large symmetric objects, such schemes require an exhaustive search over all potential symmetry axes and locations in the image, requiring excessive computation even for small images. An efficient search algorithm for detecting areas with high local reflectional symmetry that is based on a local symmetry operator is presented in [KG98]. It defines a two-dimensional reflectional symmetry measure as a function of four parameters x, y, θ , and r, where x and y are the center of the examined area, r is its radius, and θ is the angle of the reflection axis. Examining all possible values of x, y, r, and θ is computational prohibitive; therefore, the algorithm formulates the search as a global optimization problem and uses a probabilistic genetic algorithm to find the optimal solution efficiently. A different class of intensity-based algorithms [DG04, KS06, Luc04] utilizes the Fourier transform to detect global symmetric patterns in images. The unitarity of the Fourier transform preserves the symmetry of images in the Fourier domain: a symmetric object in the intensity domain, will also be symmetric in the Fourier domain. Derrode et al. [DG04] analyze the symmetries of real objects by computing the Analytic Fourier-Mellin transform (AFMT). The input image is interpolated on a polar grid in the spatial domain before computing the FFT, resulting in a polar Fourier representation. Lucchese [Luc04] provides an elegant approach to analyzing the angular properties of an image, without computing its polar DFT. An angular histogram is computed by detecting and binning the pointwise zero crossings of the

A Graph Matching Approach to Symmetry Detection and Analysis

119

difference of the Fourier magnitude in Cartesian coordinates along rays. The histogram’s maxima correspond to the direction of the zero crossing. In [KS06], Keller et al. extended Lucchese’s work, by applying the PseudoPolar Fourier transform to computing algebraically-accurate line integral in the Fourier domain. The symmetry resulted in a periodic pattern in the line integral result. This was detected by spectral analysis (MUSIC). These algorithms are by nature global, being able to effectively detect fully symmetric images, such as synthetic symmetric patterns. Yet, some of them [KS06], struggle at detecting small localized symmetric objects embedded in clutter. The frequency domain was also utilized by Lee et al. in [LCL08], where Friezeexpansions were applied to the input image. Thus converting planar rotational symmetries into periodic one-dimensional signals, whose period corresponds to the order of the symmetry. This period is estimated by recovering the maxima of the Fourier spectrum. Recent work emphasizes the use of local image features. The local information is then agglomerated to detected the global symmetry. Reisfeld et al. [RWY95] suggested a low-level, operator for interest points detection where symmetry is considered a cue. This symmetry operator constructs the symmetry map of the image by computing an edge map, where the magnitude and orientation of each edge depend on the symmetry associated with each of its pixels. The proposed operator is able to process different symmetry scales, enabling it to be used in multi-resolution schemes. A related approach was presented in [LW99], where both reflectional and rotational symmetries can be detected, even under a weak perspective projection. A Hough transform is used to derive the symmetry axes from edge contours. A refinement algorithm discards erroneous symmetry axes by imposing geometrical constraints using a voting scheme. An approach related to our work was introduced in [ZPA95], where the symmetry is analyzed as a symmetry of a set of points. For an object, given by a sequence of points, the symmetry distance is defined as the minimum distance in which we need to move the points of the original object in order to obtain a symmetric object. This also defines the symmetry transform of an object as the symmetric object that is closest to the given one. This approach requires finding point correspondences, which is in often difficult, and an exhaustive search over all potential symmetry axes is performed. Shen et al. [SIT01] used an affine invariant feature vector, computed over a set of interest points. The symmetry was detected by analyzing the cross-similarity matrix of this vectors. Rotational and reflectional symmetries can be analyzed by finding the loci corresponding to its minima. The gradient vector flow field was used in [PY04] to compute a local feature vector. For each point, its location, orientation, and magnitude were retained. Local features in the form of Taylor coefficients of the field were computed and a hashing algorithm is then applied to detect pairs of points with symmetric fields, while a voting scheme is used to robustly identify the location of symmetry axis.

120

M. Chertok and Y. Keller

Three-dimensional symmetry was analyzed in [KCD+ 02, MSHS06]. The scheme computes a reflectional symmetry descriptor that measures the amount of reflectional symmetry of 3D volumes, for all planes through the center of mass. The descriptor maps any 3D volume to a sphere, where each point on the sphere represents the amount of symmetry in the object with respect to the plane perpendicular to the direction of the point. As each point on the sphere also represents the integration over the entire volume, the descriptor is resilient to noise and to small variations between objects. We show that our approach is directly applicable to three-dimensional meshes. SIFT local image features [Low03] were applied to symmetry analysis by Loy and Eklundh in [LE06]. In their scheme, a set of feature points is detected over the image, and the corresponding SIFT descriptors are computed. Feature points are then matched in pairs by the similarity of their SIFT descriptors. These local pairwise symmetries are then agglomerated by a Hough voting space of symmetry axes. The vote of each pair in the Hough domain is given by a weight function that measures the discrepancy in the dominant angles and scales [Low03] of the feature points. As the SIFT descriptors are not reflection invariant, reflections are handled by mirroring the SIFT descriptors. In contrast, our scheme is based on a spectral relaxation of the self alignment problem. It recovers the self assignment directly. Thus, we avoid the quantization the Hough space, and our scheme can be applied, as is, to analyzing higher dimensional data and will not suffer the curse dimensionality manifested by a density (voting) estimation scheme, such as the Hough transform. Also, our scheme does not require a local symmetry measure, such as the dominant angle, and is purely geometric. It can be applied with any local feature, such as correlators and texture descriptors [OPM02]. The work of Hays et al. in [HLEL06] is of particular interest to us, as it combines the use of local image descriptors and spectral high-order assignment for translational symmetry analysis. Translational symmetry is a problem in texture analysis, where one aims to identify periodic or near-regular repeating textures, commonly known as lattices. Hays et al. propose to detect translational symmetry, by detecting feature points and computing a single, high order, spectral self-alignment. The assignments are then locally pruned and regularized using thin-plate spline warping. The corresponding motion field is elastic and nearly translational, hence the term translational symmetry. In contrast, our scheme deals with rotational and reflectional symmetries where the estimated self-alignments relate to rotational motion. The core of our work is the analysis of multiple self assignments and their manifestation via multiple eigenvectors and eigenvalues. Moreover, based on the spectral properties of geometric transform operators, we introduce a global assignment pruning measure, able to detect erroneous self-assignments. This comes out to be essential in analyzing symmetries in real images, which are often embedded in clutter.

A Graph Matching Approach to Symmetry Detection and Analysis

121

3.2 Local Features The use of local features is one of the corner stones of modern computer vision. They were found to be instrumental in a diverse set of computer vision applications such as image categorization [ZMLS07], mosaicking [BL03] and tracking [TT05], to name a few. Originating from the seminal works of Cordelia Schmid [SM97] and David Lowe [Low03], local features are used to represent an image I by a sparse set of salient points {xi }, where each point is represented by a vector of parameters Di denoted a descriptor. The salient set {xi } is detected by a detector. The detector and descriptor are designed to maximize the number of interest points that will be redetected in different images of the same object, and reliably matched using the descriptors. The descriptor characterizes a small image patch surrounding a pixel. Due to their locality, local features are resilient to geometrical deformations and appearance changes. For instance, a global complex geometrical deformation, can be normalized locally by estimating a dominant rotation angle and characteristic scale per patch, or by estimating local affine shape moments [MS04]. The choice of the geometrical normalization measures depends on the geometrical deformation we aim to handle. The set of local features {Di } is then denoted the image model of a particular image. Definition 3 (Image model). The Image model M is made of the set of N interest points S = {xi }N1 , the corresponding set of local descriptors {Di }N1 and a set of local attributes {θi , σi }N1 . θi and σi are the local dominant orientation and scale, respectively, of the point i. Denote M = {Mi }N1 , where Mi = {xi , di , θi , σi }. A myriad of region detectors and descriptors can be found in literature [MTS+ 05]. One of the most notable, being David Lowe’s SIFT descriptor [Low03]. Various objects might require different combinations of local detectors and descriptors [MTS+ 05], depending on the object’s visual properties. For instance, the SIFT [Low03] excels in detecting and describing naturally textured images, while large piecewise constant objects are better detected by the affine covariant MSER

(a)

(b)

Fig. 2 Feature point detectors. (a) A Hessian-based scale-invariant detector is more suitable for non structured scenes (b) MSER responds best to structured scenes.

122

M. Chertok and Y. Keller

[MCUP02], as shown in Fig. 2b. The ellipses in the figure represent the second moment matrix of the detected regions. In contrast, non-structured object are better characterized by affine adapted Hessian-like detectors as depicted in Fig. 2a. The common solution is to use multiple descriptors simultaneously [NZ06]. In the context of symmetry detection in contrast to object recognition, the local features are all extracted from the same image. Hence, one can assume that symmetric points within the same image, would respond well to the same type of local detector/descriptor. This allows us to use one detector-descriptor pair at a time.

3.3 Spectral Matching of Sets of Points in Rn  N2

N Given two sets of points in Rn , such that S1 = x1i 1 1 and S2 = x2j , where 1 N1

k n x j ∈ R , k = 1, 2, we aim to find a correspondence map C = cik jk 1 , such that cik jk implies that the point x1ik ∈ S1 corresponds to the point x2j ∈ S2 . Figure 3 presents k an example of two sets being matched. Spectral point matching was first presented in the seminal work of Scott and Longuet-Higgins [SLH91], who aligned point-sets by performing singular value decomposition on a point association weight matrix. In this work we follow a different formulation proposed by Berg et al. [BBM05] and its spectral relaxation introduced by Leordeanu et al. in [LH05]. We start by formulating a binary quadratic optimization problem, where the binary vector Y ∈ {0, 1}, represents all possible assignments of a point x1ik ∈ S1 to the points in set S2 . The assignment problem is then given by:

 Y ∗ = arg max Y T HY , Y ∈ {0, 1} (11) Y

Fig. 3 Toy example for matching two sets of points

where H is an affinity matrix, such that H (k1 , k1 ) is the affinity between the matchings cik jk and cik jk . H (k1 , k1 ) → 1 implies that both matchings are consistent, and 1 1 2 2 H (k1 , k1 ) → 0, implies that the matchings are contradictory. In practice, we use  2  1 1 1 2 2 H (k1 , k1 ) = exp − d xik , xik − d x jk , x jk (12) 1 2 1 2 σ

A Graph Matching Approach to Symmetry Detection and Analysis

123

where σ is a scale factor. This binary quadratic optimization is known to be np-hard [SM00], and its solution can be approximated by spectral relaxation Z ∗ = arg max Z

Z T HZ , Z ∈ R. ZT Z

(13)

Thus, Z ∗ is given by the eigenvector corresponding to the largest eigenvalue λ1 , as this maximizes Eq. 13. This approach can be considered as normalized cut cluster N N ing applied to the set of correspondences cik jk 1 1 2 . Namely, we assume that the

M true correspondences cik jk 1 (M N1 N2 ) form a tightly connected cluster based on the affinity measure in Eq. 12. In [CSS07] Cour and Shi proposed a doublystochastic normalization of the affinity matrix H, and adding an affinity constraint over the solution Z to enforce one-to-one matchings. Given the relaxed solution Z ∗ , we apply the discretization procedure given in [LH05] to derive an approximation Y to the binary vector Y ∗ . Note that since we are interested in symmetry analysis, we expect to recover multiple solutions (eigenvec K tors) Yi , where K is the order of symmetry. In the following section we show 1

how to estimate K based on the eigenvalues of the affinity matrix H, and how to recover the symmetry correspondences. We then compute the symmetry centers for rotational symmetry, and axes of reflection for reflectional symmetry.

4 Spectral Symmetry Analysis In this section we apply the spectral matching to symmetry analysis and derive the spectral symmetry analysis scheme. We start in Section 4.1, by presenting a general computational approach for the detection and analysis of symmetries of sets of points in n-dimensional spaces. We then elaborate on the analysis of symmetry in two-dimensional images in Section 4.2.

4.1 Spectral Symmetry Analysis of Sets in Rn Given a set of points S ∈ Rn , with a symmetry of order K, it follows by Section 2, that there exists a set of symmetry transformations, {TCK } and {TDK }, that map S to itself. The main issue is how to detect these multiple transformations simultaneously. In terms of numerical implementation, this implies that one has to look for a set local solutions of the corresponding optimization problem. Most matching and alignment schemes, such as RANSAC and ICP, lock on to a single optimal alignment, and it is unclear how to modify them to search for multiple solutions. Spectral relaxation provides an elegant solution to this issue. The multiple selfalignments will be manifested by the multiple maxima of the binary formulation in Eq. 11 and its corresponding relaxation in Eq. 13. The maxima of the Reighley quotient in Eq. 13 are the leading eigenvalues of the affinity matrix H, and the corresponding arguments are the corresponding eigenvectors. This beautiful

124

M. Chertok and Y. Keller

property, allows to recover multiple assignments, simultaneously, and independently by computing the eigendecomposition of H. Such an example is given in Fig. 4, where the four self alignments are manifested by four dominant eigenvalues. Note, that the largest eigenvalue corresponds to the identity transform, that maps each point to itself. Hence, given the set of interest points S ∈ Rn , we apply the spectral alignment algorithm in Section 3.3 and compute the eigen-decomposition {ψi , λi }K1 of Eq. 13. The overall number of symmetry axes is given by the number of large eigenvalues K, and the correspondence maps {Ci }K1 are derived by the discretized binary eigenvectors {ψi }. As the spectral alignment matches Euclidean distances between points, it can be used to compute multiple non-parametric alignments. This implies that we do not have to predefine which symmetry type to analyze; those that do exist, will be detected. But, this also implies that the scheme might detect erroneous self-alignments, which are unrelated to the symmetry. This phenomenon would become evident during the analysis of symmetric sets embedded in clutter. The problem is resolved in the next section, by incorporating the geometrical constraints of the symmetry transforms, discussed in Section 2. This allows us to prune the erroneous self-alignments. 4.1.1

Perfect Symmetry and Spectral Degeneracy

When analyzing perfectly symmetric sets of points, multiple alignments might be manifested by the same eigenvalue, and the corresponding eigenvector becomes degenerate. Each eigenvectors is then a linear combination of several assignments. This phenomenon, never occurs with data sources other then synthetic sets of points. For instance, in images, the feature points detectors have a certain subpixel accuracy, and corresponding feature points do not create perfect symmetries. This applies to synthetic images and even more so, to real images and three-dimensional objects that are never perfectly symmetric. In order to generalize our approach to perfectly symmetric sets of points, we propose adding Gaussian random noise N (0, σn ) to the non zero elements of the affinity matrix. This breaks down the perfect symmetry, if it exists, and does not influence the analysis of regular data. In order to retain the symmetricity of the affinity  we add a symmetric pattern of noise. As the non zeros affinities are  matrix of O 10−1 for a well chosen value of σ in Eq. 12, we used σn = 10−3 .

4.2 Spectral Symmetry Analysis of Images An image I ∈ R2 is a scalar or vector function defined over R2 . As such, the spectral matching scheme can not be used, as it applies to sets of points. Hence, we turn to image modeling by means of local features, as discussed in Section 3.2. This allows us to represent the input image as a set of salient points. The rotation invariance of the detectors, guaranties that corresponding symmetric points would be detected as salient points simultaneously. We then present in Section 4.2.2 a scheme for pruning erroneous spectral alignments, and identifying valid matchings as CK or DK . Last,

A Graph Matching Approach to Symmetry Detection and Analysis

125

given the pruned valid transforms we show in Section 4.2.3 how to recover the intrinsic geometrical properties (center of rotation and axis of reflection for CK or DK , respectively) of each detected symmetry. 4.2.1

Image Representation by Local Features

Given an input image I, we compute an image model M for each type of local detector/descriptor. A reflected replica of the features is then added to M [LE06]. This allows us to handle reflections, recalling that the local features are rotationally, but not reflectionally invariant. The features are then progressively sampled [ELPZ97] to reduce their number to a few thousands. The progressive sampling spreads the points evenly over the image, thus reducing the number of image regions with high numbers of features. These are prone to produce local partial matches. We also utilize the dominant scale property of the local features [Low03] to prune false pairwise assignments and further sparsify the affinity matrix H. As we analyze self-alignments within the same image, corresponding features will have similar dominant scales. Hence, we prune the pairwise affinities in Eq. 12 by ⎧     1 ⎪ /Sc x2jk Sc x ⎪ log  > |log (Δ S)| i ⎪ k1 1 ⎪ ⎨ or 0   (k1 , k1 ) =  (14) H   1 ⎪ /Sc x2jk Sc x  > |log (Δ S)| ⎪ log ik2 ⎪ 2 ⎪ ⎩ H (k1 , k1 ) else where Sc (x) is the dominant scale of the point x, and Δ S is a predefined scale differential. We use the |log (.)| to symmetrize the scale discrepancy with respect to both points. This implies that if a pair of corresponding points is scale-inconsistent, all of its pairwise affinities are pruned. In order to effectively utilize the sparsity of the affinity matrix H, we threshold it by T = 10−5 . Namely, the affinity in Eq.12 is always nonzero, but for geometrically inconsistent pairs,the affinity is of O 10−7 , while the consistent pairs are of   O 10−1 . 4.2.2

Symmetry Categorization and Pruning

Given the image model M we apply the spectral matching scheme described in Section 3.3, and derive P tentative self alignments of the image denoted {Ci }P1 . Typically P K, K being a typical order of image symmetry. In practice K < 10 and P ≈ 15. The reason being that in real images, the symmetric patterns might be embedded in clutter, resulting in the recovery of spurious self-alignments (eigenvectors) unrelated to the symmetry. To address this issue we propose an assignment pruning scheme, that is based on the norm property of symmetry transforms in R2 . Namely, by computing the projective transform corresponding to the recovered matching, and recalling that the norm of a symmetry transform is ±1 (Section 2), erroneous matchings can

126

M. Chertok and Y. Keller

be pruned. Moreover, this provides a mean for categorizing transforms to either CK or DK . We analyze each correspondence map Ci , by applying a normalized DLT algorithm [HZ04], and fitting a projective motion model Ti ⎤⎡ ⎤ ⎡ ⎤ ⎡ x1 x2 t11 t12 t13 (15) Ti X1 = X2 = ⎣ t21 t22 t23 ⎦ ⎣ y1 ⎦ = ⎣ y2 ⎦ , t31 t32 1 1 1 X1 and X2 being the spatial coordinates of corresponding points in Ci . Equation 15 can also be solved for an affine motion model, where the choice of the model (projective vs. affine) depends on the expected distortion within the image. The correspondence map Ci can be pruned for erroneous point matchings by applying a robust Least-Squares scheme, such as RANSAC [FB81]. Given the transform Ti , we can now apply Eqs. 3 and 7, and classify a transform Ti as a cyclic symmetry CK if det(Ti ) ≈ 1, a reflectional symmetry DK if det (Ti ) ≈ −1, or discard it as an erroneous self-alignment otherwise. Algorithm 1 summarizes the first two steps of the SSA. We emphasize that the spectral matching and pruning schemes can also be applied to sets of points in higher dimensions. An example of symmetry analysis of a three-dimensional object is provided in Section 5. 4.2.3

Computing the Geometrical Properties of the Symmetry

The center of symmetry and axis of rotation can be computed by complementary geometrical and analytical approaches. The axis of reflection can be computed analytically given the corresponding transform TDK , by applying Eq. 8. The reflection axis is the line connecting the two points, corresponding to the two eigenvectors of TDK with an eigenvalue of λi = 1. We denote this the analytical solution. In addition, one can apply a geometrical solution, where we connect the corresponding points in DK found by the spectral matching. These are the points, which were used to estimate TDK in the previous section. The reflection axis is the line that fits through the middle point of each such line segment. Given the transform TCK corresponding to some rotational symmetry CK , the center of rotation can be recovered by applying Eq. 4 and computing the eigenvector of TCK , corresponding to λi = 1. The center of rotation can also be computed geometrically, by connecting matching points. For each such line, consider the normal passing through its middle, as all such normals intersect at the center of rotation. Thus, the center of rotation is derived by solving an overdetermined set of equations in the least squares sense, similar to the robust fitting of the reflection axes. Theorems 1 and 2 provide the foundation for inferring the complete set of symmetries, given a subset of them, detected by the spectral analysis. Given two reflectional symmetry transforms {DK1 , DK2 }, the order of symmetry can be derived by computing the angle between the reflection axis Δ α . The order of symmetry is then give by solving:

A Graph Matching Approach to Symmetry Detection and Analysis

127

Algorithm 1. Spectral Symmetry Analysis of Image 1: Create an image model M of the input image. Suppose it contains N points. 2: Set P, the number of eigenvectors to analyze and ε , the norm error of the symmetric transform. 3: Progressively sample the interest regions 4: Reflect all the local descriptors {Di }N 1 in the image while preserving the originals 5: Compute an affinity matrix based on putative correspondences drawn by similar descriptors {Di }N 1 and the dominant scale pruning measure 6: Add random noise to the affinity matrix. 7: Solve Eq. 13 and compute the eigendecomposition {ψi , λi }. 8: while i < P do 9: Derive correspondence map Ci from ψi 10: Estimate transformation matrix Ti 11: if |det (Ti ) − 1| < ε then 12: Rotation detected - use Eq. 4 to find the center 13: else if |det (Ti ) + 1| < ε then 14: Reflection detected - use Eq. 8 to find the reflection axis 15: end if 16: end while

2π Z = K, Z, K ∈ Z. Δα For instance, given two reflectional axes with a relative angle Δ α = π2 , implies that the object has at least two reflectional symmetry axes. But, there might also be four or even eight symmetry axes. That would imply that the spectral scheme identified only a subset of the axes. Hence, the symmetry order can be estimated up to scale factor Z. Given more than two reflection axes, one can form a set of equations over the integer variables {Zi } 2π Z1 = K Δ α1 .. . 2π Zn = K Δ αn

(16)

As K < 8 for most natural objects, Eq. 16 can be solved by iterating over K = 1..8 and looking for the value of K for which all of the {Zi } are integers.

5 Experimental Results In this section we experimentally verify the proposed Spectral Symmetry Analysis scheme by applying it to real images and volumes. In Section 5.1 we apply the SSA to a set of real images, where the detection of the symmetries becomes

128

M. Chertok and Y. Keller

increasingly difficult. This allows us to exemplify the different aspects of the SSA, as the simple examples point out the core of the SSA, and the more difficult ones require the more elaborate components. In Section 5.2 we apply our scheme to the BioID face database [Bio01]. This dataset consists of 1521 face images with ground truth symmetry axes. This allows to asses the scheme’s accuracy and compare it to Loy and Eklundh [LE06], whose results are considered state-of-the-art. Last, we detect symmetries in three-dimensional objects in Section 5.3.

5.1 Symmetry Analysis of Images Figure 4 presents the analysis of a synthetic image with rotational symmetry of order four. Applying Algorithm 1 produces the sets of corresponding eigenvalues and eigenvectors λi and ψi , respectively. The spectral gap in Fig. 4a is evident and allows to estimate the order of symmetry. Note that the different self-alignments are manifested by non-equal eigenvalues, despite the image being synthetic. We attribute that to the imperfectness of the feature point detectors, detecting the feature points at slightly different locations, and to the addition of noise to the feature points coordinates. The transformation TC1 , corresponding to the leading eigenvalue λ1 , is

(a)

(b)

(c)

(d)

Fig. 4 Rotational symmetry. (a) Eigenvalues λi (b) The rotation corresponding to λ2 (c) The rotation corresponding to λ3 (d) The rotation corresponding to λ4

A Graph Matching Approach to Symmetry Detection and Analysis

(a)

129

(b)

Fig. 5 Reflectional symmetry (a) Eigenvalues λi (b) The reflection corresponding to λ2

found to correspond to the identity transform and is thus discarded. By estimating TC2 , the symmetry is found to be a rotational symmetry as det(TC2 ) ≈ 1. Applying Eq. 4 to TC2 recovers the center of symmetry, that is marked by a red dot in the Fig. 4b. Eigenvalues λ3 , λ4 uncover the remaining symmetrical self-alignments, which are drawn in Figs. 4c and 4d, respectively. The analysis of {λi }i>4,i