Machine Understanding: Machine Perception and Machine Perception MU [1st ed. 2020] 978-3-030-24069-1, 978-3-030-24070-7

This unique book discusses machine understanding (MU). This new branch of classic machine perception research focuses on

565 124 10MB

English Pages XVII, 215 [229] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Machine Understanding: Machine Perception and Machine Perception MU [1st ed. 2020]
 978-3-030-24069-1, 978-3-030-24070-7

Table of contents :
Front Matter ....Pages i-xvii
Human Perception (Zbigniew Les, Magdalena Les)....Pages 1-8
Machine Perception—Machine Perception MU (Zbigniew Les, Magdalena Les)....Pages 9-44
Machine Perception MU—Shape Classes (Zbigniew Les, Magdalena Les)....Pages 45-62
Machine Perception MU—3D Object Classes (Zbigniew Les, Magdalena Les)....Pages 63-73
Machine Perception MU—Picture Classes (Zbigniew Les, Magdalena Les)....Pages 75-95
Machine Perception MU—Perceptual Transformation (Zbigniew Les, Magdalena Les)....Pages 97-118
Machine Perception MU—Visual Reasoning (Zbigniew Les, Magdalena Les)....Pages 119-140
Machine Perception MU—Problem Solving (Zbigniew Les, Magdalena Les)....Pages 141-179
Machine Perception MU—Visual Intelligence Tests (Zbigniew Les, Magdalena Les)....Pages 181-215

Citation preview

Studies in Computational Intelligence 842

Zbigniew Les Magdalena Les

Machine Understanding Machine Perception and Machine Perception MU

Studies in Computational Intelligence Volume 842

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.

More information about this series at http://www.springer.com/series/7092

Zbigniew Les Magdalena Les •

Machine Understanding Machine Perception and Machine Perception MU

123

Prof. Zbigniew Les The St. Queen Jadwiga Research Institute of Understanding Toorak, Melbourne, VIC, Australia

Magdalena Les The St. Queen Jadwiga Research Institute of Understanding Toorak, Melbourne, VIC, Australia

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-24069-1 ISBN 978-3-030-24070-7 (eBook) https://doi.org/10.1007/978-3-030-24070-7 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

This book is dedicated to our Patron St. Jadwiga Queen of Poland

Introduction

This book presents the selected research results in a newly established area of scientific research which we call machine perception MU. Machine perception MU research is carried out within the machine understanding framework. Machine understanding is based on further development of the research that was presented in our previous books titled Shape Understanding System: the First Steps toward the Visual Thinking Machines, Shape Understanding System: Knowledge Implementation and Learning and Shape Understanding System: Machine Understanding and Human understanding [1–3]. This is the fourth book that presents the research results in the area of thinking and understanding, carried out by authors in the newly founded the St. Queen Jadwiga Research Institute of Understanding. Machine understanding is the term introduced by authors to denote understanding by a machine (SUS) and is referring to the new area of research the aim of which is investigating the possibility of building the machine with the ability to understand. SUS, as the machine that is designed to have the ability to think and understand, learns both knowledge and skills in the process of learning called the knowledge implementation. A machine to be able to understand needs, to some extent, mimic human understanding and for this reason machine understanding is based on the assumption that the results of understanding by the machine (SUS) can be evaluated according to the rules applied for evaluation of human understanding. The important part of machine understanding approach is investigation of the different forms of explanation how to solve a problem (text problem) or explanations of the causes and context of an object or phenomenon. In the first book [1], a brief description of philosophical investigations of topics connected with understanding and thinking, was presented. The shape that is the main perceptual category of thinking process and the important visual feature of the perceived world was briefly described. In Chap. 2, the shape classes that are regarded as the basic perceptual categories were presented. Shape classes are represented by their symbolic names. Each class is related to each other and, based on these classes, there is relatively easy to establish the ‘perceptual similarity’ among perceived objects. In Chap. 3, the description of the reasoning process that leads to assigning the perceived object to one of the shape classes was described. In vii

viii

Introduction

Chap. 4, the new hierarchical categorical structure of the different categories of the visual objects was presented. In Chap. 5 examples of visual reasoning processes that can be considered as the special kind of thinking processes were presented. The thinking process is regarded as the continuous computational activity that is triggered by perception of a new object, an ‘inner object’ or a task given by a user. In Chap. 1 of the second book [2], some aspects of human learning that are related to the newly introduced concept of the knowledge implementation, were described. In Chap. 2, a short survey of literature on the vast topic concerning learning by a machine, was presented. In Chap. 3, knowledge implementation was defined in the context of both human learning and machine learning. In Chap. 4, the selected issues connected with learning and understanding, were described in the context of the newly introduced concept of knowledge implementation. The relations between understanding and learning were also discussed. In Chap. 5, the shape understanding method was presented. In Chap. 6, categories of the visual objects were described. In Chap. 7, the theoretical framework of the knowledge implementation method was presented. In Chap. 8, the knowledge implementation as a new method of learning knowledge and skills of the different categories of objects was presented. The short description of the shape understanding system was presented in Sect. 8.1. In Sect. 8.2, learning of the new knowledge was presented. Learning of the knowledge of the visual objects can be seen as learning of the specific perceptual skills to acquire data from an image. In Sect. 8.2, learning and understanding of the text that belongs to one of the text categories such as the category of text-query, the category of text-task or the category of dictionary-text, was presented. In the third book [3], the new term machine understanding is introduced, to denote understanding by a machine (SUS) and is referring to the new area of research the aim of which is investigating the possibility of building the machine with the ability to understand. In Chap. 1, a presentation of the point of view of selected thinkers on the topic concerning human understanding and a discussion of some aspects of understanding considered to have implication for material presented in this book is given. In Chap. 2, a short survey of literature, on the vast topic concerning the existing ‘understanding’ systems, is described. In Chap. 3 machine understanding is defined in the context of both human understanding and existing systems that can be regarded as understanding systems. In Chap. 4 machine understanding that is based on the shape understanding method is presented. In Chap. 6 examples of selected problems used for testing whether these problems can be solved by the machine (SUS) are presented. The special classes of problems that are described in Chap. 6 are problems called text-tasks used for testing the results of learning at school. In Chap. 7 visual understanding, regarded as problem solving that involves naming and recognizing visual objects, is presented. In this chapter examples of learning and understanding of the objects from the leaf category and butterfly category are described. Generalization, specialization, schematization, visual abstraction and imaginary transformations applied during knowledge implementation, which are essential parts of learning and understanding, are also described in this chapter. In Chap. 8 naming of objects that are members of the sign

Introduction

ix

category is presented. Understanding of the objects from the sign category means finding their meaning that is given by assumed conventional meaningful relation, called the coding system. In this chapter understanding of the objects from the selected sign categories such as the musical symbols category, the electronic symbols category or the road signs category is also discussed In Chap. 9 understanding of the objects from the text category is described. Understanding of objects from the text category, regarded as problem solving, means finding the meaning of the text and next interpreting this text in terms of knowledge of the world. In this chapter understanding of the set theory texts is presented as the example of understanding of the text category. In the last chapter understanding of the explanatory text that is generated during explanation of finding of the solution of the command-text-tasks such as ‘solve an algebraic equation’ and generating of the explanatory text during explanation of finding of the solution to this command-text-tasks, is presented. This book presents the selected research results in machine perception MU that is the further development of the machine understanding research concerning perception as the part of the understanding process. Machine perception is term introduced by Nevatia [4] for denoting the research area where researchers investigate the possibility to build the systems that can be endowed with human perceptual ability. Although sharing the same term ‘machine perception’, research in machine perception MU is based on the very different approach. Machine perception is based on common believe of researchers from the engineering community that the meaning of the concepts such as the perception is founded on the solid basis of the scientific, empirical and theoretical findings. As it will be shown in Chap. 1, such an expectation is a mere illusion that is the matter of the deep ignorance, rather, than any rational justification. Over the years, many cognitive theories of perception have been proposed, evaluated, revised, and evolved within an impressive body of research. The perception was one of the oldest research interests for both scientists and philosophers and, instead of many efforts and painstaking search for solution to problems of perception, the results are far from being satisfactory. Although scientific theories concerning perception present a valuable steppingstone towards the goal of machine perception to embody this unique human ability within a computational system, there is the urgent need to redefine machine perception taking into account not only human or animal perception but, what is the most important, the machine oriented approach that can lead to some sort of autonomy in defining the basic categories of machine perception. Even among the researchers in human perception there are tendencies to trivialize the meaning of perceptual processes placing them into the corner of the meaningless tasks of the striving for survive. On the other hand, perception is regarded as the inferior of the two cognitive powers because it supposedly lacked the distinctness that comes only from the superior faculty of reasoning, higher cognitive functions of the mind, responsible for creating concepts, accumulating knowledge, connecting, separating, and inferring. Although there is some attempt to promote the view that perception is intelligent, in that it is based on operations similar to those that characterize thought, there is, however, big difference between solving

x

Introduction

the most complex perceptual problems and solving even simple theoretical problems, such as the mathematical proof of the basic theorems. Machine perception research that is focused on building a machine that will be endowed with human perceptual ability is following the way of thinking of those narrow minded people of science being preoccupied with topics such as object recognition, object detection or solving the navigation problems. However, as empiricist pointed out, the perception is the source of knowledge acquired through the historically evolved process of collecting of the sensory data, the knowledge that is stored in the form of the scientific books and papers, knowledge that leads to understanding of the world. In this context, it is a truism to say that it is understanding that profits from perception, perception that supplies the nutrition in the form of the sensory data. For this reason, the new field of research called machine perception MU, that started with authors’ research on machine understanding, at St. Queen Jadwiga Research Institute of Understanding, has emerged. Machine perception MU research, carried out within the machine understanding framework, differs in some important aspects from the classical machine perception approach that is focused on object recognition, object detection or solving the navigation problem. A machine that is endowed with the understanding ability needs to relay on the different perceptual strategies of gathering and transforming data. Not only analysis of problems in machine perception but analysis of any problem must be carried out within some framework that selects the fundamental assumptions and problem definitions on which the research will be founded. For this reason, machine perception MU research that are carried out within machine understanding framework is deeply rooted in this research context that supply fundamental problems definitions. The main drawback of the classical machine perception is that it is focused on the pragmatic side of the perceptual process, without paying enough attention to the cognitive aspect of the perceptual process. However, the pragmatic aspect of the human perceptual gathering and processing data is only the small part of the perceptual process aimed at the securing of the basic existential needs. Most of the human perceptual activities are directed to gather data used to build the knowledge that makes it possible to understand the world. With advent of empiricism the perceived data become the basis for near all scientific knowledge that is used during understanding of the world. For this reason, a machine that is endowed with understanding ability needs to relay on the different perceptual strategies of gathering and transforming data and the research area, part of machine understanding that will investigate these perceptual problems, we will call machine perception MU. Machine perception is attempting to enable man-made machines to perceive their environment by sensory means as humans and animals do. When classical machine perception is focused on gathering sensory information in order to transform it to some form that makes it possible to use it to perform required task, machine perception MU is aimed at building a machine that can think and understand. This new perspective places the research in machine perception MU within the framework of machine understanding research that supplies findings for exploring new possibilities of understanding and thinking abilities of the machine that can be very different from that of human abilities. In our approach the main

Introduction

xi

focus is on perception as the subordinate to understanding and by this directing the research effort not on the sensory devices but on the ability to utilize the transformed data in understanding process. Machine perception MU is concerned with machine interpretation of any sensory data. In machine perception (robotics) the sensory devices play a big role in supplying the perceptual data, often imitating the biological systems in their perceptual functioning (evolutionary psychology). Biological systems sense their environments by a variety of sources such as sight, sound, touch and smell. The sense of touch is actually many senses, including pressure, heat, cold, tickle, and pain. The sensory system is a part of the nervous system responsible for processing sensory information. It consists of sensory receptors, neural pathways, and parts of the brain involved in sensory perception. Commonly recognized sensory systems are those for vision, hearing, somatic sensation (touch), taste and olfaction (smell). Human being utilizes usually two types of sensory information e.g. seeing and hearing during understanding of the environment, whereas a machine can use more complex perceptual data that are not present in the human perception. Until now, the research in machine perception utilizes sensory data in making decision during object recognition or navigation of the robot. Machine perception MU is concerned with the machine interpretation of any sensory data. With the advent of sensor networks capable of perceiving many different data it opens the new possibilities of exploration of the new methods of building knowledge based on the categories of the sensory objects and sensory concepts. These sensory categories should be capable to grasp the essence of the multisensory data and build the new structure of multisensory knowledge. In machine perception MU the categories of the sensory objects and sensory concepts were introduced to deal with the problem of the data that come from the different sensory channels such as the auditory channel or tactile channel [1–3]. However, in comparison to the human sensory channels the machine can use the different sensory data that are interpreted in the context of the hierarchical structure of the sensory object categories. When machine perception MU is concerned with the machine interpretation of any sensory data, in this book only the visual perception is presented. Reason for this is that the machine perception MU is based on the very different approach than classical machine perception, and in order to present the material concerning the problems solved within machine perception MU research program, there is a need for the very selective arrangement of the presented topics. In this book, the focus is on the visual machine perception and in the following chapters problems concerning our new approach to machine perception will be presented. For the most practiced discriminations the perception seems to proceed in the specific way. When a percept, even if of a very sophisticated nature, is highly practiced or very important it appears that our minds build up a special-purpose mechanism solely for that purpose. Human and animal brains are structured in a modular way, with different areas processing different kinds of sensory information. Some of these modules take the form of sensory maps, mapping some aspect of the world across part of the brain’s surface. These different modules are interconnected and influence each other. For instance, taste is strongly influenced by smell.

xii

Introduction

Machine perception MU similarly to Marr’s approach is regarding perception as a series of progressively more sophisticated inferences, and it is assumed that that there are separate, specialized mechanisms for primitive and sophisticated perception. Machine perception MU is based on the learned knowledge and the well-developed visual problems solving skills. Under these assumptions, to test perceptual ability of a machine is to test if the machine can solve the given perceptual problem. Many visual problems that are solved during visual perception are formulated as the special tasks of the visual intelligence tests. These problems are given in the form of the diagrammatic representations that refer to one of the ontological line-drawing categories. The organization of the book follows the overview presented in this Introduction. The first chapter of this book presents a short survey of the philosophical inquires and psychological research in the human visual perception. In Chap. 2 the short survey of the literature on research in machine perception is presented. In the second part of this chapter the main points concerning the proposed new machine perception MU approach is outlined. In Chap. 3 the shape classes, introduced by the authors, as the basic visual categories used during the visual thinking and visual reasoning process, are presented. Shape classes are represented by the symbolic names and the description of the class refers to the visual objects such as geometrical figures. In Chap. 4 the 3D object classes that are the natural extension of the shape classes are presented. The 3D object classes, similarly like the 2D shape classes, are established based on general object attributes such as homotopy, convexity, thickness. They play the similar role in learning of the visual concept of the 3D object as the shape classes play in learning of the visual concept of the 2D object. In Chap. 5 picture classes that play an important role in machine perception are described. Perception in machine perception MU is related to the perceptual visual field where the perceived object is regarded as an image. The picture classes were introduced based on the concept of the generic class of pictures (see Les [5]). The picture classes are divided into two categories, the structural and ontological picture classes and the structural picture class refers to the structural organization of the picture plane that in visual art is often called the picture composition. In Chap. 6 the perceptual transformations that play an important role in transforming visual data into symbolic form are explained. The perceptual transformations are regarded as the special geometrical transformations that are applied in the domain of the geometrical figures and are divided into: the problem solving perceptual transformation (PS-perceptual transformation), the interpretational perceptual transformation (IN-perceptual transformation), the class forming perceptual transformation (CF-perceptual transformation), the generating object perceptual transformation (GO- perceptual transformation) and the grouping perceptual transformation (GR- perceptual transformation). In Chap. 7 the perceptual transformations are described in terms of the reasoning process that consists of the consecutive stages of reasoning, and at each stage of reasoning the different processing transformations are applied to transform visual data into the symbolic form. The image is assigned to the structural and ontological picture

Introduction

xiii

classes by application of the perceptual transformation that utilizes one or more than one processing transformations that perform the perceptual operations during the reasoning process. In this book processing transformations are presented in the visual form rather than using pseudo mathematical notation that in many computer vision literatures obscures the clarity of the presented material. The reasoning process is also presented in the new form by showing only the stages of application of the perceptual transformation (processing transformation) and visual illustration of the reasoning process. In Chap. 8 the perception as the solving of the perceptual problems is presented. In machine perception MU, the perceptual problems are divided into two categories: the interpretational perceptual problems and the visual problems. The interpretational problem that is solved by application of the IN-perceptual transformation is one of the problems that are connected with seeing and understanding. Visual problems that are given by the sequence or the set of objects, the visual representatives of the problem, and command given in the linguistic form are solved by application of the PS-perceptual transformation. Depending on the level of processing, the perceptual problems are divided into the low-level perceptual problems, the middle-level perceptual problems or the higher-level perceptual problems. The higher level interpretation of the image is based on the notation of the visual concept. In this book, the term ‘concept’ is used to represent the internal representation of the perceived object, whereas the category is regarded as the model of the object that is part of the hierarchical structure of categories. In Chap. 9 are presented the visual intelligence tests that are used to test the ability of a machine (SUS) to solve the different perceptual problems, at any perceptual level. The following categories of visual intelligence tests were introduced: the category of the visual analogy text-tasks, the category of the ‘odd one out’ text-tasks, the category of matrix text-tasks, and the category of ‘what comes next’ text-tasks. We are aware that this book could be written in a different way where some issues could be explained in more detail or presented in the different way. We would like to explain that this book was written in the ‘special’ conditions. During the most crucial part of writing of this book, we were distracted, and it was nearly impossible to continue work connected with preparation of this book in Australia. We were forced to leave Australia to be able to continue our research and to finish all works connected with preparation of this manuscript. It was discrimination and racism that we experienced during our living in Melbourne that caused a very big financial damage to us as well as the damage to our health.

References 1. Les Z, Les M (2008) Shape understanding system. The first steps toward the visual thinking machines. Studies in Computational Intelligence. Springer-Verlag, Berlin 2. Les Z, Les M (2013) Shape understanding system—knowledge implementation and learning. Studies in Computational Intelligence. Springer-Verlag, Berlin

xiv

Introduction

3. Les Z, Les M (2015) Shape understanding system—machine understanding and human understanding. Studies in Computational Intelligence. Springer-Verlag, Berlin 4. Nevatia R (1982) Machine perception. Prentice-Hall, Englewood Cliffs, N.J. 5. Les Z An aesthetic evaluation method based on image understanding approach. In: The First International Conference on Visual Information Systems VISUAL’96, Melbourne, 5–6 February 1996. VUT, pp 317–327

Contents

1 Human Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 8

2 Machine Perception—Machine Perception MU . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Machine Perception . . . . . . . . . . . . . . . . . . . . . 2.2.1 Machine Perception—Line Drawing . . . . 2.2.2 Machine Perception—Object Recognition and Scene Analysis . . . . . . . . . . . . . . . . 2.2.3 Machine Perception—Concept Formation 2.3 Specific Perceptual Problems . . . . . . . . . . . . . . . 2.3.1 Completion and Transparency Problem . . 2.3.2 Illusory Contours . . . . . . . . . . . . . . . . . . 2.4 Machine Perception MU . . . . . . . . . . . . . . . . . . 2.4.1 Sensory Object . . . . . . . . . . . . . . . . . . . 2.4.2 Visual Machine Perception MU . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

9 9 9 10

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

13 16 20 20 23 24 24 26 39

3 Machine Perception MU—Shape Classes 3.1 Introduction . . . . . . . . . . . . . . . . . . . 3.2 Shape Classes—Perceptual Operators . References . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

45 45 46 62

4 Machine Perception MU—3D Object Classes 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 4.2 The Basic 3D Object Classes . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

63 63 67 73

5 Machine Perception MU—Picture Classes . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The Perceptual Picture Classes . . . . . . . . . . . . . . . . . . . . . . . . . .

75 75 78

. . . .

. . . .

xv

xvi

Contents

5.3 The Structural Picture Classes . . . . . . . . . . . . . . . . . 5.3.1 The Tiling Picture Class . . . . . . . . . . . . . . . . 5.3.2 The Objects on the Background Picture Class 5.4 The Ontological Picture Classes . . . . . . . . . . . . . . . . 5.5 The Picture Classes—The Reasoning Process . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

79 80 85 89 91 95

6 Machine Perception MU—Perceptual Transformation 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 PS-Perceptual Transformation . . . . . . . . . . . . . . . . 6.2.1 Geometrical PS-Perceptual Transformation . 6.2.2 Arithmetical PS-Perceptual Transformation . 6.2.3 Symbolic PS-Perceptual Transformation . . . 6.3 CF-Perceptual Transformation . . . . . . . . . . . . . . . . 6.3.1 Basic CF-Perceptual Transformation . . . . . . 6.3.2 Complex CF-Perceptual Transformation . . . 6.4 IN-Perceptual Transformation . . . . . . . . . . . . . . . . 6.4.1 Basic IN-Perceptual Transformation . . . . . . 6.4.2 Complex IN-Perceptual Transformation . . . . 6.5 Grouping Perceptual Transformation . . . . . . . . . . . 6.6 GO-Perceptual Transformation . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

97 97 98 98 100 101 104 104 105 107 108 108 111 115 118

7 Machine Perception MU—Visual Reasoning . 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 7.2 Processing Transformations . . . . . . . . . . . 7.3 CT-Perceptual Transformation . . . . . . . . . 7.4 IN-Perceptual Transformation—Reasoning References . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

119 119 120 125 127 140

8 Machine Perception MU—Problem Solving . . . . . . . . . . . . . 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Low-Level Perceptual and Visual Problems . . . . . . . . . . . 8.3 Middle-Level Perceptual and Visual Problems . . . . . . . . . 8.3.1 Middle-Level Interpretational Perceptual Problems . 8.3.2 Middle-Level Visual Problems . . . . . . . . . . . . . . . 8.4 Higher-Level Perceptual and Visual Problems . . . . . . . . . 8.4.1 Higher-Level Interpretational Perceptual Problems . 8.4.2 Higher-Level Visual Problems . . . . . . . . . . . . . . . 8.5 Learning Visual Knowledge . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

141 141 144 146 147 164 166 167 170 171 179

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . Process . .......

9 Machine Perception MU—Visual Intelligence Tests . . . . . . . . . . . . . 181 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.2 The Visual ‘What Comes Next’ Test . . . . . . . . . . . . . . . . . . . . . . 186

Contents

9.3 The Visual ‘the Odd One Out’ Test . . . . . 9.4 The Visual Analogy—Analogy Problems . 9.4.1 The Visual Analogy Test (VAT) . . 9.5 The Matrix Test . . . . . . . . . . . . . . . . . . . 9.5.1 The Arithmetical Matrix Test . . . . 9.5.2 The Geometrical Matrix Test . . . . 9.5.3 The Relationships Matrix Test . . . 9.6 The Difficulty Level of the Test . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

xvii

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

191 196 199 205 205 208 209 212 214

Chapter 1

Human Perception

In this book, the new field of research, namely, machine perception MU that is placed in the broader context of research in machine understanding, is presented. Research in machine understanding, initiated by the authors, supplies the theoretical framework and creates the proper perspective for research in machine perception MU, which is opening the new horizons of the application of the sensory data to build the understanding/thinking machine (robot). Designing systems that will perform complex tasks such as solving the specific perceptual problems is based on utilization of the results of the human visual perception research. In this chapter the short survey of the philosophical inquires and psychological research in the human visual perception is presented. This chapter is not intended as a survey of literature on the vast topic concerning human visual perception, but rather, as a presentation of the point of view of selected thinkers on this topic and a discussion of some aspects of perception considered to have implication for material presented in other chapters. Over the years, many cognitive theories of perception have been proposed, evaluated, revised, and evolved within an impressive body of research. These theories present a valuable steppingstone towards the goal of machine perception, to embody this unique human ability within a computational system. Perception was one of the oldest research interests for both scientists and philosophers. Parmenides (see [1]) had insisted that there was no change or movement in the world, this meant, that sensory experience was a deceptive illusion. Parmenides called for a definite distinction between perceiving and reasoning, for it was reasoning that was responsible for the correction of the senses and the establishment of the truth. Emphasis on the unreliability of the senses served the Sophists to support their philosophical scepticism and at the same time to establish the notion of an undivided physical world, united by natural law and order. The Greek thinkers distinguished between the wise and the unwise use of sensory experience; seeing the sensory perception and reasoning as the contradictory processes, different, in principles, from each other. Democritus [1] distinguished the “dark” cognition of the senses from the “bright” or genuine cognition by reasoning. © Springer Nature Switzerland AG 2020 Z. Les and M. Les, Machine Understanding, Studies in Computational Intelligence 842, https://doi.org/10.1007/978-3-030-24070-7_1

1

2

1 Human Perception

Plato (see [2]) showed that perception was based on the stable entities of objective existence that were approached by logical operations. The wise man surveys and connects widely scattered forms (ideas) of things and discerns intuitively the generic character they have in common. After collecting these forms, they were distinguished from each other by defining the particular nature of each. Another Plato’s view is expressed in the parable of the underground den. The prisoners, formerly limited to the sight of the passing shadows, are released and disabused of their error. They are made to look at the objects of true reality dazzled by them, as though, by a strong light and gradually become accustomed to facing and accepting them. According to Plato, the sensory images were dim reflections outside of the system of reality and were excluded entirely from the hierarchy that leads from the broadest generalities to the tangible particulars. For Plato, gazing upon truth, means, the very being with which true knowledge is concerned. The colourless, formless, intangible essences are visible only to mind, the pilot of the soul. The grasp of reality by direct vision means that all enquiries and all learning are but recollection. Plato’s version of relation between ideas and sensible things is that the immutable entities and sensory appearance had coexisted rather statically. The static coexistence of the transcendental ideas and sensory appearance was a relation between prototype and image. Aristotle [3, 4] went beyond Plato in demanding a more active relation between ideas and sensible things, between universals and particulars and asserted that for any perceivable object to come about, a universal had to impress itself upon the medium or substance. Substance in itself was shapeless and inert, except for its desire to be thus impressed. This generative process by which the possible form acquired actual existence was called entelechy that gave a new vitality to the ontological status of the universals. The relation between prototypes and images was replaced by the Aristotle’s genetic connection between universals and particulars—a connection which did not deny the image function of sensory appearance but made it less exclusive. Aristotle established the universal as the indispensable condition of the individual thing’s existence and as the very character of the perceivable object. He rejected and avoided the arbitrary choice of the attributes that can serve as the basis of a generalization when induction is intended in its strict, mechanical sense. He insisted that the qualities an object shared with others of its kind were not an incidental similarity but the very essence of the object. What was general in an individual was the form impressed upon it by its genus. Perception means that we always perceive, in the particulars, kinds of thing, general qualities, rather than uniqueness and there is no such thing as the perception of the individual object, in the modern sense. Although under certain conditions events can be understood only when their repeated experience leads to generalization by induction, there are also instances in which one act of vision suffices to terminate our enquiry because we have elicited the universal from seeing. We see the reason of what we are trying to understand at the same time in each instance and intuit that it must be so in all instances. The wisdom of the universal in re is that the universal is given within the particular object itself. Aristotle saw this approach “from below” as only one side of the task, to be complemented symmetrically by the opposite

1 Human Perception

3

approach “from above.” According to Aristotle, through induction, which proceeds through an enumeration of all the cases, we arrive at the conception of the higher genera by means of abstraction. Abstraction removes the more particular attributes of the more specific instances and thereby arrives at the higher concepts, which are poorer in content but broader in range. Perception points to a different notion of abstraction, a much more sophisticated cognitive operation. The concept is obtained based on the abstraction and generalization. According to Aristotle, abstraction must be complemented with definition, which is the determination of a concept by deriving it deductively from the higher genus and pinpointing it through its distinguishing attribute (differentia). From medieval philosophers, such as Duns Scotus (see [5]) the rationalists of the seventeenth and eighteenth centuries derived the notion that the messages of the senses are confused and indistinct and that it takes reasoning to clarify them. Baumgarten [6] continued the tradition of describing perception as the inferior of the two cognitive powers because it supposedly lacked the distinctness that comes only from the superior faculty of reasoning. The creating concepts, accumulating knowledge, connecting, separating, and inferring was reserved to the higher cognitive functions of the mind, which could do their work only by withdrawing from all perceivable particulars. Discoveries about light and eyes has a big influence on the research in perception and understanding of image-formation by lenses, and the observation by Descartes of a retinal image formed on the back of a bull’s eye, led to a long-standing belief that the eye functions much like a camera, and images are flat static and meaningless. Some perceptual theories such as inference theory, Gestalt theory and Gibson theory attempt to explain perception utilizing the new scientific findings. The inference theory, associated with the empiricist view, argues that knowledge is acquired solely by sensory experience and association ideas. The Sensualist philosophers maintained that nothing is in the intellect which was not previously in the senses. The mind at birth is a tabula rasa upon which experience records sensations. The empiricists such as Locke [7] or Berkeley [8] argued that perception was constructed from more primitive sensations through a process of learning through association. von Helmholtz [9] postulated the existence of “primary” percepts and claimed that the primary percept contains all the distortions of projection but judgment intervenes and corrects them. He assumed that these corrections are based mainly on knowledge previously acquired. Helmholtz described this process as one of unconscious inferences, such that sensations of the senses are tokens for our consciousness, it being left to our intelligence to learn how to comprehend their meaning. According to Helmholtz [9] perception is involving unconscious inference by the mechanisms of perceptions in which perception were constructed from sensations, through inference to knowledge previously acquired through learning. This view is maintained by contemporary theorists (e.g. [10, 11]). Both Berkeley and Helmholtz later argued that we learn to interpret percepts through a process of association. The Gestalt view originated with Descartes and Kant for whom the mind was far from being a tabula rasa (e.g. [12, 13]). Kant [14] argues that “the mind imposes its own internal conception of space and time upon the sensory information it

4

1 Human Perception

receives”. Gestalt theory refers to the laws of association: items will become connected when they have frequently appeared together, or when they resemble each other. These laws assume that relations connect piece by piece and that these pieces remain unchanged by being tied together. The simplest among the rules that govern these relations is the rule of similarity: things that resemble each other are tied together in vision. Homogeneity is the simplest product of perceptual relation. When a sprinkling of items is seen on a sufficiently different background and sufficiently distant from the next sprinkling it will be seen as a unit. Similarity of location provides the bond. The Gestaltist of the twentieth century believed in holistic perceptual organization preordained by given laws that govern unit formation and the emergence of a figure on a background. Visual form is the most important property of a configuration. As opposed to the Gestalt school, Hebb argues that a visual form is not perceived as a whole but consists of parts [15]. The organization and mutual spatial relations of parts must be learned for successful recognition. This learning aspect of perception is the central point in Hebb’s theory. However, as suggested by Minsky [16] recognition is not based on a single instantaneous impression. We look at objects in question and explore them with our eyes until we gather enough information to identify them. The European Gestalt psychologists reacted against the structuralist assumptions that perception could be reduced to sensations and transactional functionalism started to formulate the new theoretical framework. Transactional functionalism [17], rested on the demonstrations of Ames (see [18]) that were used to illustrate that the apparently infinite number of objects could give rise to any single retinal image, and emphasize the probabilistic and inferential nature of perception. What one sees will be what one expects to see, given one’s lifetime of perceptual experience. While transactional functionalism stressed the individual’s history as important in determining perception, the “new look” (e.g. [19]) stressed the importance of individual differences in motivation, emotion and personality in influencing what might be seen. Gibson [20], who developed comprehensive theory of visual perception, claims that sensory input is enough to explain our perceptions. The theory seeks to associate percepts with physical stimuli. Gibson points out that the Gestalt school has been occupied with the study of two-dimensional projections of the three-dimensional world and that its dynamism is no more than the ambiguity of the interpretation of projected images. The first principle of his theory is that space is not a geometric or abstract entity, but a real visual one, characterized by the forms that are in it. Gibson’s theory is centred on perceiving real three-dimensional objects, not on their two-dimensional projections. The second principle is that a real world stimulus exists behind any simple or complex visual perception. This stimulus is in the form of a gradient which is a property of the surface. This theory suggests that the “input” to a perceptual system is structure in the entire optic array, and transformations in the array over time. Gibson denies that perception involves construction, interpretation or representation. Towards the end of the 19th century, the content of perception was commonly studied using the methods of analytic introspection. Behaviourisms [21, 22],

1 Human Perception

5

replaced mentalistic notions such as “sensations” and “perceptions” by objectively observable discriminative responses. The behaviourists argued that we can never know how animals or other people, experience the world, and therefore should only observe their behaviour, to examine how their responses are related to variations in the stimuli presented. Classical behaviourism provided the methodological tools for the comparative study of perception. The methods of contemporary psychology are still influenced by the behaviourist tradition. During the 1960s, associationist explanations of perceptual learning and discriminative responding gave way to a new “cognitive psychology” of perception, attention and memory. Attempts were made to describe the stages which intervened between stimulus and response. However, the revolution in information technology provided a new metaphor for perception models, in which information from the senses was seen to be processed in ways not unlike the processing of information in a computer. Processes of sensory coding, storage and retrieval of information were all discussed, and the development of computer models of some of these processes was seen to be a legitimate goal of psychological theories. It was a common believe among researchers that if a machine could be designed which could “see”, its computer program could constitute the implementation of a theory of how seeing is achieved by humans. Many theorists have argued that perceptual parsimony is achieved by making use of specific world knowledge to infer, from sensory data. Such perceptual theories may be described as involving a strong “top-down” or “conceptually-driven” component and many AI models of perception fall into this category. Other models (e.g. [23]) have been developed largely along “bottom-up” or “data-driven” lines. In such theories very general constraints are incorporated within each stage of information processing, but specific world knowledge is only recruited into the act of seeing when relatively low-level stages of informationprocessing produce ambiguous results. Marr’s theory [24] represents the most sophisticated attempt to explain the information processing operations involved in vision within a framework which cuts across the boundaries between physiology, psychology and artificial intelligence, and is the most compatible with our aim to account for animal, as well as human, perception. He made significant contributions to the study of the human visual perception system and in his paradigm the focus of research was shifted from applications to topics corresponding to modules of the human visual system. Marr developed a primal sketch paradigm for early processing of visual information. One of his many achievements is demonstration that a great deal of the processing of images can be achieved without recourse to specific world knowledge [23]. Marr’s theory sees vision as proceeding by the explicit computation of a set of symbolic descriptions of the image. The object recognition, for example, is achieved when one of the descriptions derived from an image matches one that has been stored as a representation of the particular, known object class. Marr described the different levels of computational explanation in vision. At the lowest level of explanation is a description of the hardware implementation, and at this level it would be inappropriate to claim results without specific physiological evidence. Just as there can be many possible hardware implementations for a single algorithm, there can be many algorithms for a single computational specification of

6

1 Human Perception

a problem. At the intermediate level is a description of the representations and algorithms which are used, and at this level psychophysical evidence can often be used. At the highest level of explanation, a computational theory provides a specification of what is computed rather than how it is computed. During the late 1970s and early 1980s, the dominant view of human perception has been that perception proceeds through successive levels of increasingly sophisticated representations until finally, at some point, information is transferred to general cognitive faculties. It seems that human perception proceeds from seemingly primitive inferences of shapes, textures, colours to the more sophisticated inferences of objects such as chairs, trees. Although, for example, Marr [23] regarded perception as a smooth series of progressively more sophisticated inferences, it is more likely that there are separate, specialized mechanisms for primitive and sophisticated perception. This leads to a conception of our perceptual apparatus as containing two distinct parts: the first, a special-purpose, perhaps innate mechanism that supports primitive perception, and the second, something that closely resembles general cognition. The sensory data is first examined by the mechanisms of primitive perception to discover instances of rigidity, parallelism, part-like groupings and other evidences of causal organization, thus providing an explanation of the image data in terms of generic formative processes. The mechanisms of sophisticated perception then use specific, learned knowledge about the world to refine this primitive, generic explanation into a detailed account of the environment. There is no principled reason to separate sophisticated perception from general purpose reasoning. The characteristics of primitive perception, however, are quite different from that of cognition. Primitive perception proceeds without utilization of the full range of our world knowledge. The body of knowledge on which primitive perception draws is of quite limited extent, at least in comparison to our conscious world knowledge. Primitive perception is based on the perceptual organization, that is, the preattentive organization of sensory data into primitives like texture, colour, and shape. For the most practiced discriminations the perception seems to proceed in the specific way. When a percept, even if of a very sophisticated nature, is highly practiced or very important it appears that our minds build up a special-purpose mechanism solely for that purpose. There may be, therefore, a sort of “compiler” for building specialized routines for these often-repeated, important or time-critical discriminations. How much of our day-to-day perception is handled by such special purpose routines is very much an open question. Pentland’s [25] approach of a visual perceptual organization is in a manner similar to Gibson or Marr by constructing a theory of how our perceptual concepts are lawfully related to the regularities (structure) in our environment. Like Marr, but unlike Gibson, his computational theory is focused on how the physical regularities of our world allow the computation of meaningful descriptions of our surroundings by use of image data. Further, these descriptions should match the perceptual organization that is imposed on the stimulus. Pentland [25] also investigated methods for representation of natural forms by means of fractal geometry. He argued that fractal functions are appropriate for natural shape representation because many physical processes produce fractal surface shapes. Also Koenderink

1 Human Perception

7

[26] have studied the psychological aspects of visual perception and proposed several interesting paradigms. Some of these issues will be discussed in the next Chapter concerning machine perception. When there are general theories concerning explanation of the human perception there is also the research in human perception focused on explanation the specific perceptual processes such as object recognition, transparency or occlusion. On the basis of studies showing selective impairment in object recognition in persons with unilateral cerebral lesions, Warrington [27] suggested that object recognition is a two-stage categorization process. Persons with right-hemisphere lesions have difficulty in deciding whether pictures representing different views of an object are in fact showing the same physical object. Persons with left-hemisphere lesions are able to classify different aspects of an object as belonging to the same physical stimulus, but their ability to attach meaning to their percept is impaired. This suggests that in the first stage of visual recognition the image is categorized on perceptual grounds only, whereas a perceptual category is given semantic content in the second stage and implies that the concept of the object consist of the two main ingredients, the visual concept and the non-visual concept. The popular atomistic view assumed a basic vocabulary of elementary sensations from which our perceptions are made. Atomism started with Aristotle and the concept of a basic vocabulary was the subject of many controversies. Locke [7], in the theory of psychophysical dualism, tried to point out that perception is made up of sensations (input) and reflections. Attneave [28] performed psychological experiments to investigate the significance of corners for perception. In the famous Attneave’s cat experiment, a drawing of a cat, was used to locate points of high curvature which were then connected to create a simplified drawing and it has been suggested that points have high information content. Attneave’s work has initiated further research on the topic of curve partitioning. To approximate curves by straight lines, high curvature points are the best place to break the lines, thereby the resulting image retains the maximal amount of information necessary for successful shape recognition. Attneave’s work [29] that investigates the significance of corners for perception, initiated further research on the topic of curve partitioning and led to a vocabulary-based scheme made up of primitive shape descriptors called “codons” for describing 2D plane curves [30]. Hoffman [30, 31] argued that when the visual system decomposes objects it does so at points of high negative curvature. Lowe [32] proposed a computer vision system that can recognize three-dimensional objects from unknown viewpoints and single two-dimensional images. The procedure is non-typical and uses three mechanisms of perceptual grouping to determine three-dimensional knowledge about the object as opposed to a standard bottom-up approach. The disadvantage of bottom-up approaches is that they require an extensive amount of information to perform recognition of an object. Instead, the human visual system is able to perform recognition even with very sparse data or partially occluded objects.

8

1 Human Perception

References 1. Kirk GS, Raven JE (1957) The pre-socratic philosophers. A critical history with a selection of texts. Cambridge University Press, London 2. Plato (1993) Republic (trans: Waterfield R). Oxford University Press, Oxford 3. Aristotle (1989) Metaphysics (trans: Tredennick H), vol 17, 18. Harvard University Press, Cambridge 4. Aristotle (2014) Categories (trans: Edghill E). The University of Adelaide, Adelaide 5. Gilson E (1991) The spirit of mediaeval philosophy. University of Notre Dame Press, London 6. Baumgarten A (1750) Aesthetica 7. Locke J (1961) An essay concerning human understanding. Dent, London 8. Berkeley G (1996) Principles of human knowledge and three dialogues. The world’s classics. Oxford University Press, Oxford 9. von Helmholtz H (1878) The facts of perception. In: Kahl R (ed) Selected writings of Hermann von Helmholtz. Wesleyan University Press, 1971, Middletown, Connecticut 10. Gregory R (1974) Concepts and mechanisms of perception. Duckworth, London 11. Gregory RL (1970) The intelligent eye. McGraw-Hill Paperbacks, New York 12. Koehler W (1929) Gestalt psychology. Liverlight 13. Koffka K (1935) Principles of gestalt psychology. Harcourt Brace, Jovanovic 14. Kant I (1998) Critique of pure reason. Cambridge University Press, Cambridge 15. Hebb D (1949) The organization of behavior. Wiley, New York 16. Minsky M, Papert S (1969) Perceptrons. M.I.T Press, Cambridge 17. Kilpatrick WH (1952) The supposed conflict between moral freedom and scientific determinism. Educ Theory 2(1):11–19 18. Ittelson WH (1952) The Ames demonstrations in perception. Princeton University Press, Princeton 19. Bruner JS, Goodman CC (1947) Value and need as organizing factors in perception. J Abnorm Soc Psychol 42:33–44 20. Gibson J (1950) The perception of the visual world. Houghton, Michigan 21. Watson JB (1913) Psychology as the behaviorist views it. Psychol Rev 20:158–177 22. Watson JB (1924) Behaviorism. People’s Institute, New York 23. Marr D (1976) Early processing of visual information. Philos Trans R Soc B: Biol Sci 275:483–524 24. Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W. H. Freeman, San Francisco 25. Pentland AP (1986) Perceptual organization and the representation of natural form. Artif Intell 28(3):293–331 26. Koenderink J, van Doorn A (1986) Dynamic shape. Biol Cybernet 53:383–396 27. Warrington EK, Taylor AM (1978) Two categorical stages of object recognition. Perception 7 (6):695–705 28. Attneave F, Arnoult MD (1956) The quantitative study of shape and pattern perception. Psychol Bull 53:452–471 29. Attneave F (1954) Some informational aspects of visual perception. Psychol Rev 61(3): 183–193 30. Richards W, Hoffman D (1985) Codon constraints on closed 2-D shapes. Graph Image Process 31:265–281 31. Hoffman D, Richards W (1984) Parts of recognition. Cognition 18:65–96 32. Lowe EJ (1993) Perception: a causal representative theory. In: Wright E (ed) New representationalism: essays in the philosophy of perception. Avebury, Aldershot

Chapter 2

Machine Perception—Machine Perception MU

2.1

Introduction

In the previous chapter the short survey of the philosophical inquires and psychological research in the human visual perception was outlined. As it was shown, visual perception is long researched ability of human to see and understand the visual world. Machine perception research started with Nevatia [1], however many research topics that are concerned with solving machine perception problems are topics of research in similar areas such as computer vision, machine vision, machine understanding or robotics. In this chapter, the short survey of the literature on research in machine perception is presented. The literature on machine perception is growing rapidly and, in this review, only selected problems that are relevant to the issues presented in this book will be briefly described.

2.2

Machine Perception

Machine perception, as described by Nevatia [1], is the research area where researchers investigate the possibility to build the system that can be endowed with human perceptual ability. Machine perception is mostly concerned with object recognition, object detection or solving the navigation problems. The research in machine perception is strictly connected with research in other areas such as machine vision, computer vision, pattern recognition, robotics or image understanding because perception is always connected with other processes such as navigation of robots or understanding of the content of images. Machines that perceive their environments and perform required tasks have an obvious usefulness for diverse application areas such as industrial assembly and inspection, planetary space exploration or automated medical x-ray screening. Another important application area is the interpretation of images taken from aircrafts or satellites for © Springer Nature Switzerland AG 2020 Z. Les and M. Les, Machine Understanding, Studies in Computational Intelligence 842, https://doi.org/10.1007/978-3-030-24070-7_2

9

10

2

Machine Perception—Machine Perception MU

the monitoring of earth resources, weather patterns, and military surveillance. The most popular application is to build a special purpose system for particular applications, taking advantage of the prior knowledge of the limited environment to simplify the processing. While classical machine perception is focused on gathering sensory information in order to transform it to some form useful to perform required task, machine perception MU, as the part of machine understanding approach, is aimed at building the machine that can think and understand. In machine perception MU, the line-drawing category is the most important perceptual category and for this reason in this chapter the short review of research concerning problems connected with the line drawing interpretation is presented. Important part of machine perception research is concerned with problems of concept formation, regarded as the search for the proper perceptual model (concept), and short review of these issues is given in the context of the proposed complex conceptual representations of knowledge. In machine perception MU, the specific perceptual problems such as problems of illusory contours or occlusion and transparency are solved by application of the interpretational perceptual mechanism and a short review of perception research concerning these problems is also included.

2.2.1

Machine Perception—Line Drawing

In machine perception MU, the line-drawing perceptual category is the most important perceptual category that is referring to the visual representation which in scientific literature is called line drawing. In machine perception line drawing is often used in the context of research solving the interpretational problems, by applying the line-labelling method. The most of research was carried out assuming the ideal line drawing representations of the polyhedral objects as part of the object recognition and three-dimensional scene analysis. The field of three-dimensional scene analysis originated with the pioneering work of Roberts [2]. He considered the problem of recognizing simple 3D polyhedral objects from a single view. Objects were recognized by matching against computed projections of known objects. In the scientific literature the line drawing refers to the broad range of pictures from the technical, engineering drawing to the conventional illustrative drawing or the drawing of the artistic works. The expressive drawing of the visual arts is distinguished from the technical drawing that is aimed at precise communication in the preparation of a functional document. Technical drawing that is a graphical language that communicates ideas and information is used to fully and clearly define requirements for engineered items. Hochberg et al. [3] experiments demonstrate that the recognition of line drawings does not require any special form of learning and that it follows naturally from the ability to recognize three-dimensional objects. In this experiment, a child who was raised until the age of 19 months without being shown pictures of any kind, had no difficulty in naming the contents of the first line drawings, he saw. Some of

2.2 Machine Perception

11

these objects are shown in Fig. 2.1a. Interpretation of the line-drawing is topic of the research in perception where many experiments, measures and models were used to express Gestalt ideas such as “good shape” more precisely. Hochberg [3] argued that as the complexity of the figures as two-dimensional line drawings increase there is a tendency for the figures to be perceived as two-dimensional representations of three-dimensional objects. The more angles the figure contains, the more complex it is in two dimensions, and the more likely it is to be perceived as a representation of a “simpler”, three-dimensional object. Also, the number of differently sized angles and the number of continuous lines has an influence on the figure complexity. Thus, the more complex, asymmetrical and discontinuous the 2D pattern, the more likely it will be perceived as representing a projection of a 3D figure (see Fig. 2.1b). Machine perception research is focused on interpretation of the line drawing in terms of 2D objects, 3D objects or scenes. For interpretation of images of the two-dimensional objects the pattern classification techniques are usually applied. Perceiving a three-dimensional scene from a single view, which gives a two-dimensional image, is a more difficult task. Interpretation of two-dimensional images is inherently ambiguous because the same image can be formed by the infinite number of three-dimensional scenes and also the image, formed by a particular object, changes with the viewing angle. However, the most difficult problem is caused by the fact that in scenes with multiple objects, parts of otherwise visible surfaces of some objects may be occluded by others. The research in machine perception, focused on recognizing simple 3D polyhedral objects, was initiated by Roberts [2] and many researchers use the different methods to cope with the problems of object recognition and scene interpretation. Roberts’ threedimensional scene analysis was restricted to the scenes consisting of polyhedral solids with homogeneous surfaces against uniform backgrounds that can be characterized by the intersection lines of the objects. However, the scene analysis techniques become more complex as the number of objects grows and large parts of objects are occluded by others. A major simplification occurs if the lines, vertices, and faces belonging to different objects can be separated (segmented). After parts of complete objects have been segmented, complex objects or structures can be described by relationships of these parts. Figure 2.1c shows the image from the line drawings that can be interpreted as the outlines of a collection of objects and further as a scene of a collection of distinct objects. This scene is further interpreted as being made up of two blocks and a wedge, with one block partially occluding the other two objects. However, the object in Fig. 2.1c is just a collection of straight hues in a variety of orientations and only spontaneous perception interprets this image as the scene that consists of a collection of distinct objects. Most of works of the classical line drawings concern the polyhedral objects and machine vision systems often use the technique of line-labelling to reduce the possible interpretations of these line drawings to a processable number. Line-labelling attempts to identify each line in a line drawing as convex, concave or occluding and the best-established method of labelling line drawings is by means of a junction catalogue. Guzman [4] was the first, who attempted to identify the junctions which may appear in valid line drawings, however the first systematic

12

2

Machine Perception—Machine Perception MU

Fig. 2.1 Examples of line-drawings presented in Hochberg’s experiment (a), examples of the figures used by Hochberg and Brooks (b), line-drawings of the outlines of a collection of objects (c)

catalogues were produced by Clowes [5] and Huffman [6]. By considering three perpendicular planes intersecting at the origin and analysing the line drawings produced in all 28 possible combinations of solid and empty regions, they showed that the twelve junction types are all of the possible views of trihedral vertices. In addition to the twelve junction types obtained by this procedure, Clowes and Huffman also listed the four occlusive T-junctions. Waltz [7] extended the types of line labels by including crack and shadow edges. The Clowes-Huffman catalogue for line drawings of trihedral polyhedral is well-established although the limitation to trihedral vertices is somewhat restrictive. Clowes-Huffman line-labelling has been used successfully in a number of applications, including the interpretation of natural line drawings [8] and of freehand sketches [9]. A system for interpreting freehand sketches as three-dimensional objects and constructing solid models (limited to polyhedral) of the objects sketched is presented in [10]. In [11] labelling line drawings of objects with tetrahedral as well as trihedral vertices was presented. Authors describe a method of obtaining a complete junction catalogue for drawings of such polyhedral and give example drawings, illustrating each possible labelling. The junction labels, computed as part of the line-labelling, are themselves useful, and this may explain why other methods of line-labelling such as gradient space [12] and sidedness reasoning [13] have not superseded the junction catalogue method for trihedral line drawings. Although other methods may be more accurate, this method is relatively robust in handling misplaced junctions and incorrect projections. As described in [11] the process used to derive the Clowes-Huffman catalogue produces, as a side effect, the complete catalogues of junctions possible in drawings of extended trihedral objects. By adding the six junction types it is possible to label drawings of any extended trihedral polyhedron [14]. As Sugihara [15] points out, if there is independent knowledge of the types of objects which may appear in line drawings, there is a possibility to produce a complete catalogue of the junction labelling of that object. A full catalogue of valid tetrahedral junction labels is considerably larger than the equivalent Clowes-Huffman catalogue for trihedral junction labels. Its very size may make it less useful in practice, as may the fact that it is far harder to obtain an unambiguous

2.2 Machine Perception

13

labelling for a line drawing explicitly or implicitly containing tetrahedral vertices. For this reason, cutdown catalogues of common tetrahedral junctions rather than the full catalogue have been used. The labelling technique is a powerful tool for the interpretation of perfect line drawings and if a line drawing is assumed to be extracted correctly from a real scene, at least one of the interpretations has to correspond to the real scene. That is, the line drawing is realizable. However, if a line drawing is not extracted correctly, or a line drawing is given by a human, it may not be realizable or may even not be interpreted. In this context, Huffman [6] raised the problem of realizability of line drawings by introducing the idea of dual space for this problem and Mackworth [12] used the name gradient space for solving problems of realizability of line drawings. Interpretation of line drawings is, in principle, solved by labelling methods if scenes are constrained to include only trihedral polyhedral or a class of curved objects. However, the extension of these methods for interpretation of more general objects involves many unsolved problems: how to make huge junction dictionaries, or how to select an appropriate interpretation from a large number of interpretations obtained by labelling methods. Another important problem of the labelling methods is that they accept as input only perfect line drawings. On the other hand, extraction of perfect line drawings is very difficult, especially if the input is just a grey-level image. The other approach is to modify the imperfect line drawing so that it may have a consistent interpretation. The problem in this approach is that, even if some lines are added or deleted from a consistent line drawing, the result may also be consistent. One solution is to introduce a suitable criterion for deciding which line drawing is the best. Another difficulty lies in the labelling problem. In labelling polyhedral line drawings, the constraint that the label of a line does not change in the middle, can be used. For curved objects, however, the label of a line may change in the middle. The label change was examined for possible pairs of junctions, and the result was also stored in the dictionary. In machine perception MU, there is a possibility to adopt the labelling techniques; however, because of their limited reliability and limited interpretational power, these methods were not used.

2.2.2

Machine Perception—Object Recognition and Scene Analysis

Object recognition and scene analysis is only a small part of research in machine perception MU, however, in this chapter some of the object recognition issues, that are relevant in the context of material presented in this book, are presented. A theory of pattern (object) recognition specifies how structural descriptions suitable for recognition can be derived from retinal images and matched against stored structural models, which define the particular objects categories. A major problem, however, is to account for the recognition of patterns whose orientations with

14

2

Machine Perception—Machine Perception MU

respect to the retina may vary (rotated version), and whose projected shapes on the retina may vary due to changes in the slant of the pattern with respect to the viewer. In this context, the question of how we achieve shape constancy is the fundamental problem which must be solved by a theory of pattern recognition. For descriptions to be useful for recognition, they should be based on an external reference frame, not on an internal, viewer-centred frame. Marr [16] have suggested that descriptions for shape recognition could be referred to a coordinate system based upon the major axis of an elongated symmetrical shape. There has been extensive research in visual perception, examining which kind of reference frames are used when viewing simple patterns (see e.g. [17, 18]) and one important finding, which has emerged from such studies, is that under some circumstances the perception of elongated shapes does seem to be based upon their intrinsic axes of elongation that is consistent with Marr and Nishihara’s suggestion. Even the simplest form of object recognition is not unitary activity but makes available a number of different visual, semantic, and verbal descriptions. Normally, when the particular object of a familiar category is recognized, it is possible to describe how the object would look if seen from a different viewpoint. This requires having some understanding of the object’s three-dimensional shape. For example, looking at a view of a horse in which one of its legs was occluded by another, we would expect to see the “invisible” leg if the horse was seen from a different viewpoint. We are also able to describe the functions or uses of a familiar object category (e.g. “a domesticated animal, previously used as a means of transport”) and know of other objects with which it may be associated (cart, hound). Additionally, we can label a familiar object with a particular name (horse). Such activities reveal that we have visual knowledge of the object’s shape, semantic knowledge of the object’s function and associates, and verbal knowledge of the object’s name. Experimental evidence suggests that there are distinct stages in the processing of objects: the perceptual classification, the semantic classification, and name retrieval (e.g. see [19–21]). The perceptual classification stage involves matching a given view of an object to a stored representation of the appearance of that object. The semantic classification stage is the point where the functions and associates of a given object are retrieved from a semantic memory system. As it was described, the most of works in machine perception is concerned with the object recognition and scene understanding. Recognition implies that a correspondence has been found between elements of the image and the prior representation of objects in the world. According to Marr [16] the object recognition is achieved when one of the descriptions derived from an image matches one that has been stored as a representation of the particular, known objects class. The object recognition problem can be divided into the following sub-problems: object detection, if the goal is looking for the known object, instance recognition if a specific rigid object is being to recognize or a general category (class) recognition which may involve recognizing instances of extremely varied classes such as animals or furniture. Recognizing an object (instance recognition) approached by finding its constituent parts and measuring their geometric relationships is one of the oldest approaches to object recognition [22–24]. Face recognition is an example

2.2 Machine Perception

15

of the part recognition [25–27] and pedestrian detection [28]. Some of the central issues in part-based recognition is the representation of geometric relationships, the representation of individual parts, and algorithms for learning such descriptions and recognizing them at run time [29, 30]. Part-based models can have different topologies for the geometric connections between the parts. For example, Felzenszwalb [31] restricts the connections to a tree which makes learning and inference more tractable. Context plays a very important role in human object recognition [32] and greatly improves the performance of object recognition algorithms [33], as well as, provides useful semantic clues for general scene understanding [34]. The context information can be given in many different ways. A simple way to incorporate contextual spatial information into a recognition algorithm is to compute feature statistics over different regions [35]. The object recognition models have considered the use of context information from a “global” or “local” image level. Global context considers image statistics from the image as a whole. Local context, on the other hand, considers context information from neighbouring areas of the object. Some of the most recent works in scene understanding exploit the existence of large numbers of labelled images to perform matching directly against whole images, where the images themselves implicitly encode the expected relationships between objects [36, 37]. Global context exploits scene configuration (image as a whole) as an extra source of global information across categories. Many object categorization frameworks have incorporated this prior information for their localization tasks [36, 38–41]. Murphy et al. [38] exploited context features using a scene “gist” which influences priors of object existence and global location within a scene. The “gist” of an image is a holistic, low-dimensional representation of the whole image. Local context information is derived from the area that surrounds the object to detect other objects, pixels or patches. The role of local context has been studied in psychology for the task of object [42] and face detection [43]. Sinha and Torralba [43] found that inclusion of local contextual regions substantially improves face detection performance, indicating that the internal features for facial representations encode this contextual information. Many object categorization models have used local context from pixels [44–48], patches [49–51] and objects [52–57] that surround the target object, greatly improving the task of object categorization. As it was pointed out, the most of works in machine perception is concerned with scene analysis. Scene analysis methods attempt to describe a pattern in terms of simpler primitives extracted from the input, and recognition is by matching of their descriptions. The task of recognizing and localizing objects in isolation is relatively easy in comparison to that of understanding the scene (context) in which the object occurs, and which the topic of image understanding research is. In literature there are many systems that are built in the area of image understanding. Primary objective of image understanding systems is to construct a symbolic description of the scene depicted in the image. The image understanding system (IUS) analyses an image in order to interpret the scene in terms of the object model given to the IUS as knowledge about the world. Interpretation refers to the correspondence (mapping) between the description of the scene and the structure of the

16

2

Machine Perception—Machine Perception MU

image. It associates objects in the scene (houses, roads) with features in the image (e.g. points, lines, regions). Once the description of the scene has been constructed, IUS can answer various queries about the scene (e.g. how many houses exist in the scene?) and can perform the physical operations by controlling robot manipulators e.g. pick up and move the physical object. Image understanding systems interpret the image based on the knowledge that is usually stored in the form of the semantic networks. Examples of image understanding systems that utilize knowledge are VISIONS [53], ALVEN [58], ACRONYM [59], SCHEMA [60], SIGMA [61], CITE [62], SOO-PIN [63], or ERNEST [64].

2.2.3

Machine Perception—Concept Formation

Important part of machine perception research is concerned with a problem of concept formation regarded as the problem of finding the proper perceptual model (concept). The real world that we see and touch is primarily composed of three-dimensional solid objects. When an object is viewed for the first time the information about that object from many different viewpoints is gathered. Process of gathering detailed object information and storing that information is often referred as model formation or concept formation. Concept formation is connected with the problem of visual representation, schematization and visual thinking. Schematic representation of the visual object is usually given in the form of an analogical representation that is often called a diagrammatic representation [65]. Diagrams are pictorial, analogical, abstract knowledge representations that are characterized by a parallel correspondence between the structure of the representation and the structure of the represented. Anderson et al. [66] provides survey of the underlying theory of diagrammatic representations as well as numerous examples of diagrammatic reasoning (human and mechanical) that illustrate both its power and its limitation. Categorization and categories are related to concepts and play important role in all aspects of cognition. Categories are mental representations, abstract objects or abilities that make up the fundamental building blocks of thoughts and beliefs. In the contemporary philosophy the concept is often understood as a mental representation, where concepts are entities that exist in the mind (mental objects), as abilities, where concepts are abilities peculiar to cognitive agents (mental states) or concepts are abstract objects, as opposed to mental objects and mental states [67]. A concept is often understood as an abstraction standing for a set of objects sharing some properties that differentiate them from other concepts [67]. In the real world, concepts are never isolated. Concepts that are organized into a hierarchy can be used for making a generalization. When the mind makes a generalization such as the concept of tree, it extracts similarities from numerous examples; the simplification enables higher-level thinking. At a given level of the hierarchy, concepts are usually disjoint, but sometimes they can overlap; the difference between them may be small or large. In a generalization hierarchy, a concept can be exemplified not

2.2 Machine Perception

17

only by the objects at the bottom level, but also by sub-concepts at any level below the concept in question. A concept is instantiated by all of its actual or potential instances, whether these are things in the real world or other ideas. In biological sciences the categories or concepts are often called taxonomies. Taxonomy is the science of defining and naming groups of biological organisms on the basis of shared characteristics. Concepts can be organized into a hierarchy, higher levels of which are termed “superordinate” and lower levels termed “subordinate” [68]. Additionally, there is the “basic” or “middle” level at which people will most readily categorize a concept. There is some evidence showing that object recognition can be based on quite general description of the object. For instance, Rosch et al. [68] showed that subjects apply a base-level name to objects faster than they can supply their specific names. The base-level is a special level of abstraction, at which most people tend to identify an object initially [68]. The subordinate level is the level that refers to the more specific categories and by this includes a more detail description. Researchers has pointed to the importance of visual features in determining the categories that are formed at different levels [69–71]. Research results have shown that categorization at a subordinate level requires more visual processing than categorization at the basic level [72, 73]. Several studies (e.g. [73–75]) have used sets of line drawings with several exemplars from different basic-level categories. More atypical exemplars are identified at a more subordinate level than more typical exemplars. Op de Beeck [76] experiments show that this effect also exists in stimulus set containing rather typical exemplars without very salient subordinate names. In the computer vision literature, the concept that is strictly related to categorization is often described in terms of the semantic context. Early studies in psychology and cognition show that semantic context aids visual recognition in human perception. Palmer [77] examined the influence of prior presentation of visual scenes on the identification of briefly presented drawings of real-world objects. Early computer vision systems adopted these findings and defined semantic context as pre-defined rules [22, 53, 55] in order to facilitate recognition of objects in real world images. Hanson and Riseman [53] proposed the VISIONS schema system where semantic context is defined by hand coded rules. Sources of semantic context in early works were obtained from common expert knowledge [22, 53, 55] which constrained the recognition system to a narrow domain and allowed just a limited number of methods to deal with uncertainty of real world scenes. On the other hand, annotated image databases [57] and external knowledge bases [54] can deal with more general cases of real world images. A similar evolution happened when learning semantic relations from those sources, pre-defined rules [22], were replaced by methods that learned the implicit semantic relations as pixel features [57] and co-occurrence matrices [54]. Spatial context is defined by the likelihood of finding an object in some position and not others with respect to other objects in the scene. Bar et al. [78] examined the consequences of pairwise spatial relations on human performance in recognition tasks, between objects that typically co-occur in the same scene. Spatial context is incorporated from inter-pixel statistics [36, 38, 39, 41, 45–48, 57, 79], and from pairwise relations between regions in images

18

2

Machine Perception—Machine Perception MU

[44, 49–52]. Recently, the work by Shotton et al. [48] acquired spatial context from inter pixel statistics incorporating texture, layout and spatial information for object categorization of real world images. Recent works in computer vision such as [48, 49] use strongly annotated training data as main source of spatial context. Scale context is a contextual relation based on the scales of an object with respect to others. The CONDOR system by Strat and Fischler [55] was one of the first computer vision systems that added scale context as a feature to recognize objects. Lately, a handful of methods for object recognition have used this type of context [38–40, 50, 56, 79]. The concept in machine perception is understood as the specific representation of the visual data (model) used for matching the perceived concept (description) with the learned models stored in memory. There are many methods of object representation such as canonical view, surface-based object representation, silhouette based representation, geometric model, graph representation, the extended gaussian image, fractal model, or generalized cylinder. The most often used representation schema is concerned with representing solid objects. Techniques for representing solid objects [80] and vision [81] can be divided into three major categories: boundary representations, swept-volume representations, and volumetric representations. Boundary representations represent a polyhedral object by the faces that bound it. The representation called a wire-frame representation is the representation where the adjacency information, that is stored in the data structure, specifies only the endpoints of each line segment. A popular and commonly-used representation for polyhedral is the winged-edge polyhedron representation [82], which stores all the adjacency information in space proportional to the number of edges of the polyhedron. Boundary representations can also be used to represent non-polyhedral objects by representing the surfaces with various types of surface patches. One common method for representing curved surface patches is with B-splines [83]. Skeleton representation uses space-curve skeleton models. Skeleton geometry provides useful abstract information. If a radius function is specified at each point on the skeleton, this representation is capable of general-purpose object description. For recognition of three-dimensional objects with a small number of stable orientations on a flat light table, a library of two-dimensional silhouette projections, that represent three-dimensional objects, is ideal if object silhouettes are different enough. Recently, more powerful two-dimensional projection representation schemes, known as “aspect graphs” and “characteristic views”, are becoming popular. Aspect graphs and characteristic views were introduced for representing shapes of three-dimensional objects. Koenderink [84] introduced the concept of aspect graphs for representing shapes of three-dimensional objects. An aspect graph is a graph structure, whose nodes represent various aspects of object and boundaries between adjacent regions. Most theories (e.g. [16, 85]) emphasize the recognition of objects from any viewpoint. However, there is compelling evidence that not all views of objects are equally easy to recognize. Palmer et al. [86] described how each of the different objects appears to have a “canonical” viewpoint, which is often something like a three-quarters view. Chakravarty [87] proposed representing an object by a set of characteristic views. In Chakravarty’s

2.2 Machine Perception

19

system, all of the infinite two-dimensional perspective projection views of an object are grouped into a finite number of topologically equivalent classes. A representative member of an equivalence class is called a characteristic view. Because characteristic views specify the three-dimensional structure of an object, they provide a general-purpose representation of that object. Structural descriptions are symbolic descriptions of the features of a pattern and their spatial arrangements, which have proved useful in more recent approaches to pattern and object recognition. A structural description of a T-shape, for example, might specify that one part (a vertical line) supports and bisects a second part (a horizontal line). Novel patterns can be classified by comparing derived descriptions of them with stored structural models of different object classes, in which some aspects of the description are made obligatory. For something to be a “T” there must be a vertical line and a horizontal line, and the former must support the latter at a point about midway along its length [88]. The great advantage of this kind of approach is that we can leave certain parameters unspecified in the structural model which defines a concept, whereas others are made obligatory. The more complex attempt to form the concept (model) was application of the semantic network representation as the basic conceptual structure in learning generalizations of concepts in a blocks world domain [88, 89]. Swept-volume representations represent a solid by representing an infinite union of cross sections by means of a cross sectional function and an axis or spine of the object. The cross section may be constant along the whole axis, it may be allowed to change linearly from one end of the axis to the other, or it may be stored as a number of samples or some other functions. In generalized-cone (or generalizedcylinder) representation, an object is represented by a three-dimensional space curve that acts as a spine or axis of the cone, a two-dimensional cross-sectional figure, and a sweeping rule that defines how the cross section is to be swept and possibly modified along the space curve. Generalized cones are well suited for representing many real-world shapes. However, certain objects such as a human face or an automobile body are almost impossible to represent as generalized cones. Despite its limitations, generalized-cone representation is popular in computer vision. Learning of the concept can be approached by one of the machine learning methods such as Similarity Based Learning (SBL). SBL is one of the main disciplines in the field of machine learning, and it is especially suited for investigations in connection with computer vision. The first major breakthrough in the field was Winston’s work [88], who showed some successful generalizations of concepts in a blocks world domain, using semantic networks as a basic structure. Another approach using semantic network representations was presented by Connell and Brady [89]. Because of some basic restrictions of semantic network representations became evident, several approaches to overcoming the problems of semantic networks have been proposed. Dong et al. [90] proposed to employ graded predicates, such as very-small, medium-small, or medium and Haar [91] went even further and proposed predicates that carry certain intervals of parameters. In a totally different

20

2

Machine Perception—Machine Perception MU

direction, Ueda and Suzuki [92] proposed to use contour information directly, without any encoding predicates or network structures. They obtained generalized contours by a multi-scale smoothing followed by matching over several scales.

2.3

Specific Perceptual Problems

In machine perception MU perceptual problems are divided into the three categories: the low-level perceptual problems, the middle-level perceptual problems and the higher-level perceptual problems. In the scientific literature these problems are presented as the completion and transparency problems or illusory contours problems. In this chapter a short review of those topics is presented.

2.3.1

Completion and Transparency Problem

One of the middle-level perceptual problems, presented in this book, is called the problem of completion and transparency. The completion and transparency are strictly connected with perception and interpretation of occluded (overlapping) objects and are part of theoretical consideration of painted or shaded surfaces, thin-lines, moving surfaces, transparent surfaces, and lighting effects such as shadows or perception of illusory surfaces or illusory contours. The completion is characterized by the overall appearance of goodness of the shape where the information from the total visible configuration of the partly occluded object is used in the completion process to give rise to a complete object that is as simple as possible to describe, whereas transparency can be regarded as the special case of restructuring by completion in perception. Perception of the world can be regarded as the perception of an image that consists of summation of partial views driven by focusing attention on the selected areas of the image, where only part of the image is visible. In consecutive scanning of the image, by looking at one or another part of image, the perceived detail (object) is confronted with the hypothesis that is made based on the previously perceived parts of the image as well as the previously learned knowledge [93]. The human visual system is remarkably adept at sorting out the various contrast edges found in images and inferring the surfaces that generated them, making explicit their overlap or depth relation. Often the physical configuration of objects is under-constrained by the limited information available in a single view and additional constrain or assumption must be brought into play. The completion as the perceptual mechanism is often regarded in terms of perceptual completion behind occlusion. The question what principles of organization drive the interpretation of overlapping objects of visible and non-visible contours is often addressed in the context of completion problem. Visible contours are called modal, occluded contours are called amodal and both amodal and modal

2.3 Specific Perceptual Problems

21

continuations are the result of the same early visual process. Kellman and Shipley [94] claimed that the brain fills-in gaps (interpolates) following the same rules, irrespective of whether a continuation is modal or amodal. Amodal perception is the perception of objects that are partially occluded by other objects. Amodal perception deals with amodal completion that typically occurs when we look at an object that is partially hidden by another object. Briscoe [95] points out that there are two types of amodal completion: one stimulus driven and not depending on background knowledge and the other depending on stored information about the kind of objects we are perceiving and their individual properties. The topic of amodal completion has been often investigated by using partly occluded shapes that are regular or quasi-regular shapes. Theories concerning visual completion can be categorized into three main groups: local theories, global theories, and integrated theories. The local theories derive from the Gestalt law of good continuation, which states that any curve will proceed in its own natural way [96] or in other words, a line will continue in the same direction [97–99]. Kanizsa [100] was an important advocate of this local account, pointing at the role of T-junctions as an index of this local completion process. Various theories formalized law of good continuation, such as the theory of visual interpolation [101]. Boselie and Wouterlood [102, 103] proposed the local model, namely good-continuation model. The global theories have their roots in the Gestalt law of Pragnanz that states that perceptual organization will always be as “good” as the dominating conditions allow [97]. In these models the information from the total visible configuration of the partly occluded object is used in the completion process to give rise to a complete object that is as simple as possible to describe. Based on this idea, Hochberg and McAlister [104] proposed the global-minimum principle that states that a figure will be interpreted such that the information that defines it is minimal. Leeuwenberg [105, 106] formalized this global-minimum principle by means of a descriptive system, the structural information theory, and in their turn some other researchers, [107, 108] applied it to occlusion phenomena. Sekuler [109] showed that global processes could also dominate the completion of occluded figures, a finding that was also reported by Bar and Ullman [110] and Parikh et al. [111]. Some authors (e.g. [100, 112, 113]) show however, that the global-minimum principle not always predicts the preferred completion of an occluded figure. Both local and global completion models give the different results for some figures and for this reason integrative models have been proposed. Sekuler [109] proposed a qualitative integrative model in which both local and global influences play a role in the amodal completion process. The model predicts different completions on the basis of factors such as good continuation, symmetry, repetition, familiarity and context. Another integrative model was proposed by Baumgart [114], Coons [115] in which the nature of a completion is predicted on the basis of a quantification of both global and local aspects. In many experiments concerning amodal completion the regular shapes were used, and only in some experiments nonregular shapes were used (see e.g. [116– 118]). The investigation of the presence of global influences in the completion process for partly occluded quasi-regular shapes was described in [119]. In the

22

2

Machine Perception—Machine Perception MU

experiment quasi-regular shapes (shapes with a certain overall regularity but not based on metrical identities) were selected. Results from experiments provided evidence for global completions and supported the notion that global influences on visual occlusion are apparent even when the partly occluded stimulus is outside the domain of regular shapes. However, the results of investigations on amodal completion show that there is no ground to believe that there are any general rules that govern the process of completion of the more general shapes. In the case of the regular shapes the local or global rules can give similar results and justification of these results can be given by relation to the assumption about the regularities of the shape. In the case of the nonregular shapes the nearly each reasonable interpretation can be justified. The completion and transparency are part of the perception of the partly hidden object and relay on the laws of perceptual organization. At the same time, the mind tunes out characteristics not relevant to the mental model, and in this way, it corrects the stimulus to fit a perceptual expectancy. Albert [120] studied the occlusion, transparency and lightness in the context of their interplay in the perceptual process. Many authors (e.g. [121]) suggested that perceptual decomposition of image luminance into multiple sources in different layers (e.g. perceptual transparency) is critical to their lightness illusions. In [120], it was shown that simple perceptual occlusion evoked by T-junctions can explain this effect. Adelson [122], Ueda and Suzuki [123] devised a variety of new lightness illusions that were stronger than any previously known and also evoke percepts of transparency. The transparent layers perceived in Anderson and Winawer’s displays [121] are inhomogeneous, whereas the transparent layers perceived in the displays used in most previous studies of perceptual transparency are homogeneous (e.g. [124]). Transparency can be regarded as the special case of restructuring by completion in perception [125]. For the pattern that consists of three shapes, a red one, a blue one, and between them a purple one (see Fig. 2.2) as the result of interpretation connected with solving transparency problem, a simpler overall pattern is obtained, two mutually overlapping shapes (an oval and a square) rather than three adjacent ones. Interpretation of the colours as the particular relation between the three colours, namely, P = B + R, is done by restructuring the unitary central colour in such a way that a superposition of two colours is seen, where one colour would be seen otherwise. This solution adapts the order of colours to the interpretation of the subjects. In this case the perceptual solution of the problem tends to present itself with great immediacy, and there can be no question but that the intelligent

Fig. 2.2 The pattern consists of three shapes: a red one, a blue one and between them a purple one, interpreted as the result of solving a transparency problem

2.3 Specific Perceptual Problems

23

rearrangement of an unsatisfactory stimulus organization occurs in the act of perception itself and not in some secondary elaboration of the perceptual product. In this book amodal completion is presented in the context of interpretation of an object by applying the IN-perceptual transformation. Contrary to the problems presented in literature, where main focus is to explain the transparency and completion phenomena, the research in machine perception MU is concerned with application of the IN-perceptual transformation to solve visual problems such as the intelligence tests that are given as the set of occluded or transparent objects.

2.3.2

Illusory Contours

The middle-level perceptual problem that utilizes the specific interpretational mechanism is called the problem of illusory contours. The problem of perception of an illusory surface or contour is strictly related to the completion and transparency problem. The perception of the illusory surface, the subjectively perceived surface that is not given in the image, is one of the most intriguing phenomena in vision. The phenomenal appearance of contours in the absence of abrupt stimulus gradients was first discovered by Schumann [126] who concluded that subjective contours were the result of some organizational process, similar in nature to closure. Schumann’s work on subjective contours has been extended by Kanizsa [127] who demonstrated that curvilinear as well as straight contours could be produced. When the given configuration of objects is observed, white triangle in the centre of the configuration is perceived, although no such figure physically exists. The perceived triangle, in addition to being whiter than the surround, is described as a plane surface which appears to be in front of the other elements in the pattern. Other contours have been produced in Julesz patterns [128] using binocular disparity. Corns [129] analysis of the experimental data indicates that both monocular and binocular subjective contours result from the presence of depth cues in the stimulus array. Ever since the pioneering proposals of Coren [129], there has been a great deal of experimental investigation [130] as well as theoretical speculation [131], concerning the relationship between these two aspects of illusory figures. Explanations of illusory contours can be categorized into two classes. One class of explanation incorporates mechanisms whereby certain types of visible image events such as contrast edges, contour junctions, and thin lines and their endpoints, cause the assertion of additional events in the image representation such as illusory contours and surface brightness variations [132]. The second class of explanation maintains that illusory contours are only a by-product of computations for which the primary goal is to infer 21/2 dimensional properties of the physical world such as surfaces, their colours, their boundaries, and the occlusion relations [133]. Saund [134] proposed a new formulation for the perceptual organization of occluding contours by application of the computational theory that incorporates a richer ontology of image junction interpretations than previously has been proposed. In another explanation it is assumed that the visual system is converting 2D retinal

24

2

Machine Perception—Machine Perception MU

images into 3D interpretations by using available cues such as occlusion, perspective, and stereo disparity. In the contest of the depth computation, as the essential factor in creating the illusion, computational principles of how illusory lightness and illusory contours emerge in this phenomenon, were investigated by Kogo et al. [135]. They showed, by implemented model of the Differentiation– Integration for Surface Completion, that the computation of border-ownership which reflects the context of images can explain the emergence of illusory surfaces [136]. They suggested that the context-sensitive mechanism of depth computation plays a key role in creating the illusion, and that the illusory lightness perception can be explained by an influence of depth perception on the lightness computation, and that the perception of variations of the Kanizsa figure can be well-reproduced by implementing these principles [136].

2.4

Machine Perception MU

Short survey of the philosophical inquires and psychological research in the human visual perception has shown how differently the perception was understood. Perception was regarded as the inferior of the two cognitive powers because it supposedly lacked the distinctness that comes only from the superior faculty of reasoning. The creating concepts, accumulating knowledge, connecting, separating, and inferring were reserved to the higher cognitive functions of the mind. Arnheim [137] and Rock [112] suggested that perception is intelligent in that it is based on operations similar to those that characterize thought. However, due to the dependence of perception on sensory information there is a difference between perception and ‘higher’ cognitive functions such as imagination or thinking. Although Arnhaim extended the meaning of the term ‘perception’ to include ‘cognitive’ and ‘cognition’ there is still lack of agreement which cognitive processes need to be included in defining perception. Perception is strictly related to understanding and there is no a strict demarcation line that clearly differentiates the perceptual processes from so called higher level understanding processes. In machine perception MU perception is regarded as the part of understanding process and there is no clear demarcation line between the perceptual reasoning and reasoning that leads to understanding. For this reason, the selected perceptual problems presented in this book that are strictly related to those problems, regarded as the higher level understanding problems, are given only in the form of the short description.

2.4.1

Sensory Object

Machine perception is a field of research attempting to enable man-made machines to perceive their environment by sensory means as humans and animals do. When classical machine perception is focused on gathering sensory information in order

2.4 Machine Perception MU

25

to transform it to some forms that makes it possible to use it to perform required task, machine perception MU is part of the machine understanding area of research that is aimed at building machine that can think and understand. This new perspective places the research in machine perception MU within the framework of machine understanding research that supplies findings for exploring understanding and thinking abilities of a machine. In our approach the main focus is on perception as the subordinate to understanding and by this, directing the research effort not on the sensory devices but on the ability to utilize the transformed data in understanding process. Machine perception MU research is deeply rooted in the research concerning human perception; however sensory devices which are now accessible offer more powerful possibilities to obtain the different sensory data that can be used in understanding process. For this reason, there is a need to explore new forms of understanding that can apply many different types of sensory data in understanding process. In machine perception (robotics) the sensory devices play a big role in supplying the perceptual data, and often imitate the biological systems in their perceptual functioning (e.g. evolutionary psychology). Biological systems sense their environments by a variety of sources such as sight, sound, touch and smell. Sensory abilities of different organisms coevolve, as is the case with the hearing of echolocating bats that have evolved to respond to the sounds that the bats make. Sound waves provide useful information about the sources of and distances to objects, with larger animals making and hearing lower-frequency sounds and smaller animals making and hearing higher-frequency sounds. Taste and smell respond to chemicals in the environment that were significant for fitness in the environment of evolutionary adaptedness. The sense of touch includes pressure, heat, cold, tickle, and pain. The sensory system is a part of the nervous system responsible for processing sensory information. It consists of sensory receptors, neural pathways, and parts of the brain involved in sensory perception. Commonly recognized sensory systems are: vision, hearing, somatic sensation (touch), taste and olfaction (smell). Human being utilizes usually two types of sensory information e.g. seeing and hearing during understanding of the environment, whereas machine can use more complex perceptual data that are not present in the human perception. Until now, the research in machine perception utilizes sensory data in making decision during object recognition or navigation of the robot. Machine perception MU is concerned with the machine interpretation of any sensory data. With the advent of sensor networks capable of perceiving many different data it opens the new possibilities of exploration of the new methods of building knowledge based on the categories of the sensory objects and sensory concepts. These sensory categories should be capable to grasp the essence of the multisensory data and build the new structure of multisensory knowledge. In machine perception MU the categories of the sensory objects and sensory concepts were introduced to deal with the problem of the data that come from the different sensory channels such as the auditory channel or tactile channel [138–140]. A sensory object is the object that is named based on a set of measurements that refer to attributes of the category to which the object is assigned. Naming an object from the category of

26

2

Machine Perception—Machine Perception MU

sensory objects is to classify the object to one of the categories of sensory objects. For example, perceived object is assigned to the mineral sensory object category based on the measurement of the characteristic minerals’ features. The category of minerals, described in [140], is derived from the category of non-man-made objects given by the following categorical chain: . . .  mReO  mEar  mNLiv  mNMan  mMin . The aim of naming (recognition or classification) is to assign the examined object oi to one of the mineral categories miM based on a set of measurements m a j and finding the mineral category miM for which the measures of attributes of the  object m a j are matched with the values of attributes of the mineral category  u aij . The naming of the sensory object is described in more detail in [140]. However, in comparison to the human sensory channels the machine can use the different sensory data that are interpreted in the context of the hierarchical structure of the sensory object categories. When machine perception MU is concerned with the machine interpretation of any sensory data, in this book only the visual perception is presented. Reason for this is that the machine perception MU is based on the very different approach that classical machine perception, and in order to present the material concerning the problems solved within machine perception MU research program there is a need for the very selective arrangement of the presented topics. In this book the focus is on the visual machine perception and in the following chapters problems concerning our new approach to machine perception will be presented.

2.4.2

Visual Machine Perception MU

Machine perception research started with Nevatia research [1], however many research topics that are concerned with solving machine perception problems are subjects of research in similar areas such as computer vision, machine vision, machine understanding or robotics. As it was shown in the previous Sections of this chapter, machine perception research is focused on the visual perception which is mostly concerned with object recognition, object detection or solving the navigation problems. The perceptual problems that are solved within this machine perception framework are concerned with the perception of brightness, colour, texture, connectivity, context, motion, shading and depth as well as low-level segmentation techniques. Despite early successes within narrow domains, analysing data of a single modality (e.g. facial recognition), a general solution to machine perception remains elusive. This problem stems from the difficulties such as lack of the ability to model the process of perception in order to efficiently and effectively interpret the growing stream of multimodal (and incomplete) sensor data. While instance recognition techniques are relatively mature and are used in commercial applications, general category (class) recognition is still a largely unsolved problem. Visual category recognition is an extremely challenging problem, and no one has yet constructed a system that approaches the performance level of a two year-old child.

2.4 Machine Perception MU

27

The main drawback of the classical machine perception is that it is focused on the pragmatic side of the perceptual process without paying enough attention to the cognitive aspect of this process. The pragmatic aspect of the human perceptual gathering and processing data is only the small part of the perceptual process and is aimed at securing of the basic existential needs. Most of the human perceptual activities are directed to collect data that are useful in building the knowledge that is used in understanding of the world. With advent of empiricism the perceived data become the basis for near all scientific knowledge that is used during understanding. When the empiricist relay on the data that are perceived through the means of application of the measurement apparatus, the phenomenologists have tried to build the knowledge by applying the data gathered through the special kind of the phenomenological insight. However, the empiricist approach to build the knowledge about the world is the most useful for machine and for this reason it is adopted in machine understanding MU. Machine perception MU research is carried out within the machine understanding framework and in this important aspect differs from the classical machine perception approach. Machine understanding MU is regarding understanding as the ability to solve problems and supply the explanation of the different aspects of the world. For this reason, a machine that is endowed with the understanding ability needs to relay on the different perceptual strategies of gathering and transforming data. The analysis of any problem must be carried out within some framework that selects the fundamental assumptions and problem definitions on which the research will be founded. These issues will be discussed in this section, in the context of presenting the main points of the machine perception MU. Sophisticated mechanisms have evolved to efficiently perceive environment, including the use of background knowledge to determine on what aspects of the environment to focus attention. Perceptual experience is built not only on utilizing of the learned knowledge, but mostly on the constant exploration of the place where we are staying. Being in the room is not only to analyse some pictures taken occasionally from selected points, but constantly concentrate on some significant elements that can supply some evidence for nearly infinite stream of hypothesis concerning a given place. In this context, perception can be regarded as the process that is connected with the exploration of the environment and can be often understood (recognized) as the part of this explorative process. For instance, the men that is entering the house have some expectation concerning the interior and furniture. This will drive the expectation and modify perception of objects that he will encounter. We can say that perceived object is always matched with some expected object and by this the learned visual aspects of the perceived object interfere with the visual shape (object) that is presently perceived. These expectations were often formulated in form of the law of the perceptual organization. Machine perception research utilizes some theoretical founding from Gestalt psychologists’ perceptual spontaneous organization. Perceptual organization is governed by the principles of Pragnanz [141], however, their lack of modern notions of computation limited their success. In machine perception the complex perceptual processes and their relations to

28

2

Machine Perception—Machine Perception MU

cognition are not given enough attention. Most researchers are focused on translation of the Gestalt laws of the perceptual organization into the form that can be approached by computer system [142–144]. The one reason for this is that there is a lack of forms of representation that can capture some of the intricacies of the perceptual organization. The machine perception MU approach differs in some important aspects from the dominant tradition of research in machine perception. Research is carried out within the machine understanding framework and research on perception is rooted in all theoretical and experimental findings of this area of research. The psychological findings concerning perception such as laws of the perceptual organization are formulated in the form that is compatible with definitions and theoretical formulation of the machine understanding framework. Research in machine perception MU, contrary to research in classical machine perception, is focused on perception that leads to understanding regarded as problem solving. Interpretation of the visual data (perceived object) in terms of the real world scene or object is only one of many problems that a machine needs to solve, to be able to understand the world. However, contrary to classical machine perception approaches, where the interpretation of images is the only aim of the image understanding systems, in machine perception MU approach, interpretation of images is only one of the visual understanding problems that needs to be solved and cannot be solved in separation from other visual and non-visual problems. In this book, no attempt is made to relate machine perception MU research findings to psychological models. At this stage of research, only some analogies with human perception will be pointed out without the explanatory purpose. However, the research on machine perception MU can provide a rigorous way to test the predictions of psychological theories, and it may suggest and stimulate further psychological experiments. The common view within machine perception community is that if a machine could be designed which could “see”, its computer program could constitute the implementation of a theory of how seeing is achieved by humans. In machine understanding MU the different view is promoted. If a machine could be designed in such a way that it could “see”, its computer program could constitute the implementation of a theory of how seeing is achieved by a machine. This seeing could be based on the totally different mechanism than this that is responsible for human seeing. In machine perception (computer vision) and other fields of research concerning perceiving of the visual world (data) there is a bulk of research concerning the 3D real world objects or scene. However, realistic images of the real world objects or scenes are only small part of images that are used in understanding the world. Most of images used in solving visual problems are schematic representations of the object or phenomenon, showing only some aspects of the world that is normally not accessible to perception. For example, the cross-section of the machine part or scientific visualization of the behaviour of the dynamical systems are reefing to parts or objects that are not visible. Also, in literature concerning using of images in problem solving there is tendency to reduce this problem to the problem of learning the neural networks to perform some sort of reasoning based on so called natural

2.4 Machine Perception MU

29

images [145]. However, most of problems humans solve require the schematic representation that shows only some selected functional aspects of the object represented by its image. For this reason, in machine perception MU, a member of each ontological category is represented by all possible realistic, conventional or schematic representations. As it was described in Chap. 1, perception was defined and understood in many different ways. In defining the machine perception MU framework, the formulation of some theoretical findings was utilized. For example, view that there are separate, specialized mechanisms for primitive and sophisticated perception was adopted in formulating the machine perception MU research program. This view leads to a conception of perceptual apparatus as containing two distinct parts: the first, a special-purpose innate mechanism that supports primitive perception, and the second, something that closely resembles general cognition. The sensory data is first examined by the mechanisms of primitive perception to discover instances of rigidity, parallelism, part-like groupings and other evidences of causal organization, thus providing an explanation of the image data in terms of generic formative processes. The mechanisms of sophisticated perception then use specific, learned knowledge about the world to refine this primitive, generic explanation into a detailed account of the environment. Similarly, machine perception MU is based on assumption that perceptual interpretation consists of specialized mechanisms for primitive and sophisticated perception. In machine perception MU solving perceptual problems at the low-perceptual level does not require learning knowledge about the real world or scene. Warrington and Taylor [146] suggested that solving higher-level perceptual problem such as recognition is a two stage categorization process. The view that in the first stage of visual recognition the image is categorized on perceptual grounds only, whereas a perceptual category is given semantic content in the second stage is adopted in machine perception MU approach, where the concept of the object consists of the two main ingredients, the visual concept and non-visual concept. The visual concept consists of the different visual aspects of the object. The non-visual concept, that refers to the meaning of the object and is the result of the naming process, consists of non-visual knowledge concerning of the object. According to Arnhaim the visual perception can be described in terms of the problem solving ability of the human perceptual (vision) system. Human perception spontaneously interprets the visual data utilizing the learned knowledge. Similarly, solving perceptual problems in machine perception MU, by means of spontaneous interpretation is based on application of IN-perceptual transformations that at higher-perceptual level utilize the big amount of learned knowledge to interpret the image. Perception in machine perception MU approach is regarded as the problem solving activities concerned with solving the different perceptual problems. Machine perception MU is strictly connected with learning of new knowledge and learning is connected with understanding. The machine ability to understand depends on the effectiveness of learning process and learning of the new knowledge depends on the machine ability to understand. Understanding is based on the learned knowledge acquired during the knowledge implementation process.

30

2

Machine Perception—Machine Perception MU

The visual knowledge is represented in the form of the categorical structure of different categories and in the form of the visual concepts. The visual concept is a part of the hierarchical structure of categories, where at the higher level are the general categories (classes) and the specific classes derived from these classes inhabit the lower levels of the hierarchical categorical structure. Only existence of the hierarchical structure of the categories makes it possible to make generalization. In machine perception MU, the perceptual problems are divided into two categories: the interpretational perceptual problems and the visual problems. The interpretational perceptual problems that are solved by application of the IN-perceptual transformations are perceptual problems that are associated with seeing and understanding. Visual problems are problems that are usually given by the sequence or the set of objects, the visual representatives of the problem, and commands presented in the linguistic form. Visual problems are solved by application of the PS-perceptual transformations. The image, as the visual representative of the visual problem, is the main visual perceptual element. The image is related to the perceptual categories. The interpretational perceptual problem, given by the perceived image is solved, first by assigning this image to one of the perceptual categories. In machine perception MU, the main focus is on solving problems that are given in the form of the diagrammatic representation that refers to the line-drawing perceptual category or silhouette perceptual category. The image, from the line-drawing perceptual category or the silhouette category, is most often used in representing the schematic view of the process or object and is interpreted by application of the IN-perceptual transformation, by assigning it to one of the picture classes. The perceived image that is assigned to one of the picture classes, for example, the class Pnh ½nUm ½H, can be interpreted as the ontological picture class such as the technical line drawing picture class. Interpreting the image from the specific technical line drawing picture class, such as the cross-section picture class or the multiview technical drawing class, requires learning of both the visual and non-visual knowledge. Interpreting the image from the picture class in terms of understanding of its content is the problem that is solved during understanding. During the spontaneous perception the perceived image is assigned only to one of these ontological picture classes without the detail interpretation of the image content. These issues were discussed in other authors books [139, 140]. An image from the perceptual line drawing category that is assigned to the picture class Pnh ½nUm ½H, is interpreted as the image from the objects on the background picture class, and objects extracted from this image are members of the shape class Um ½H. The object extracted from the picture class Pnh ½nUm ½H can be assigned to the 3D object class and further interpreted as the real world object from one of the visual object ontological categories. The 3D object or the real world object is usually represented by many visual aspects that are 2D objects (images), whereas the 3D object is perceived as the 2D object that represents only one of the visual aspects of the 3D object. Depending on the level of processing, the perceptual problems are divided into the low-level perceptual problems, the middle-level perceptual problems or the

2.4 Machine Perception MU

31

higher-level perceptual problems. The low-level interpretational perceptual problems interpret the image in terms of the symbolic names of the basic shape classes by applying the IN-perceptual transformations. This interpretation assigns the perceived object to one of the basic shape classes such as the convex, concave or cyclic class. The specific shape classes derived from one of the basic shape classes are characterized by parameters that show the orientation, size and placement of the object and can be used to solve visual problems, such as the discriminative problems that will find the one convex object that is different from the other rotated convex objects. The low-level visual problems such as discriminative tests are solved by identifying the known object among the set of objects usually by applying the geometrical PS-transformation. The middle-level interpretational perceptual problem interprets the image in terms of the symbolic names of the complex shape classes by applying the middle-level IN-perceptual transformations. One of these problems is the completion and transparency problem that interprets the perceived objects in terms of the occlusion or transparency class. The middle-level visual problem, such as the visual intelligence test, is solved, first by interpreting the objects, representatives of this problem, as the members of the complex shape classes and next, by applying the PS-perceptual transformation, the solution is found. The 3D occlusion and transparency problems are much more complex, however finding the solution is based on the similar higher-level interpretational mechanism. In this book, the occlusion and transparency problems are restricted to the line-drawing perceptual category. The perceived object from the line-drawing perceptual category, in order to be interpreted as the occlusion or the transparent object needs to be, at first, assigned to the class Un ½H. The middle-level interpretational perceptual problem is usually solved based on the previously learned knowledge. The higher-level interpretational perceptual problem interprets the image in terms of the 3D object classes, ontological picture classes or one of the ontological categories, given by the category name. Interpretational problems concerned with spontaneous interpretation of the image as the 3D object, are usually connected with the focusing attention and naming. After naming, the problem is solved by referring to the name of object and knowledge that is associated with the ontological category to which the name is referring, such as the sign category or the real world objects category. The higher-level visual problem, such as finding the solution to the test, is solved by applying the PS-perceptual transformation that, at first, interprets the object, the representative of this problem, as the member of the 3D object class or names of one of the ontological categories, and next the solution is found. Research in the traditional machine perception is focused on solving problems of interpretation of the 2D visual data (image) as the 3D objects. Normally, when a particular object of a familiar category is recognized, it is possible to describe how the object would look if seen from a different viewpoint. Perceiving a three-dimensional scene from a single point of view, which gives a two-dimensional image, is more difficult task and interpreting of two-dimensional images is inherently ambiguous because the same image can be formed by an infinite number of three-dimensional scenes, and also, the image formed by a

32

2

Machine Perception—Machine Perception MU

particular object changes with the viewing angle (perspective change). However, the most difficult problem is caused by the fact that in scenes with multiple objects, parts of otherwise visible surfaces of some objects may be occluded by others. A perceptual system needs to separate the objects in the image and recognize them from the partial information. Interpretation of the object at the higher-perceptual level, such as interpretation of the image in terms of the 3D objects is complex process and is performed in two stages, the perceptual classification stage and the semantic classification stage. As it was shown in [19, 21], the perceptual classification stage involves matching a given view of an object to a stored representation of the appearance of that object; it is the stage at which different views of the same object can be seen as belonging together, distinct from other object categories. The semantic classification stage is the point where the functions and associates of a given object are retrieved from a semantic memory system. In machine perception MU the perceptual classification stage is associated with the naming process, whereas the semantic classification stage is associated with solving 3D problems such as finding solution to the 3D visual analogy test. In machine perception MU solving problems of interpretation of the image in terms of 3D object or scene is only small part of the research interest. Solving the higher-level interpretational perceptual problems is usually connected with naming. After naming the problem is solved by referring to the name of object and knowledge that is associated with the ontological category to which the name is referring. Contrary to interpretation of the object at the lower-level or the middle-level of interpretation, where the object is usually interpreted in terms of the general symbolic name  gi , the higher-level interpretation interprets the object based on the notation of the visual concept and requires the proper knowledge that needs to be learned. Perceived object o given by the 2D representation r is extracted from the perceived image, a member of the one of the structural or ontological picture classes. We will assume that the structural picture class is the picture class Pnh ½nUn ½H and the ontological picture category is the technical drawing category of the simple tool (3D solid). The interpretation of the 2D object o extracted from an image, at the higher interpretational level, involves transforming the perceived object into the general symbolic name  gO and next, matching this general symbolic name with the visual concept, represented as the set of symbolic names that is part of the learned hierarchical categorical knowledge. At first, the perceived object o is transformed to the 3D object symbolic name ½DgO by interpreting it as the member of the 3D object shape class. Next, the 2D representation rðoÞ of the perceived object o, given by the symbolic name gO , is matched with the proper partial visual concepts CM ðvij Þ, C P ðvij Þ, C V ðvij Þ, and C P ðvij Þ, of each i-th category at j-th categorical level vij . The visual concept of i-th category at j-th categorical level vij is given by the multiview concept C M ðvi Þ ¼ gM , P the different projections concept CP ðvi Þ ¼ fgP 1 ; . . .; gNP g, the characteristic views of concept CV ðvi Þ ¼ fgV1 ; . . .; gVNV g and the characteristic concept CP ðvi Þ ¼ fgP1 ; . . .; gPNP g. After matching, the object o given by the 2D representation r is assigned to the i-th category at j-th categorical level vij , rðoÞ . vij . As the result, the

2.4 Machine Perception MU

33

perceived object is interpreted as the member of the i-th category at j-th categorical level vij and given by the name nij . When the name nij is assigned to the perceived object, the knowledge of the vij category is accessible and given as the object concept C O ½vij ðoÞ. The visual concept C½vij ðoÞ, that is now accessible, is representative of the i-th category at j-th categorical level vij and is given as a set of D symbolic names of the 3D object class CD ½vij ðoÞ ¼ fgD 1 ; . . .; gND g and all partial visual concepts such as CM ðvi Þ, CP ðvi Þ, CV ðvi Þ and C P ðvi Þ. When the name nij is assigned to the perceived object, the visual and non-visual knowledge of the vij category is accessible and given as the concept CO ½vij ðoÞ. The perceived object is usually assigned to the category vk at the basic level of the categorical chain  vm      vk . The basic level is the most abstract level at which members of the category share a common overall shape. However, in some case, when the perceived object is not well visible, the object can be assigned to the category vm that is at the higher level than the category vk . The perceptual ability to see, that is strictly connected with understanding, is based on the ability called visual intelligence that has typically meant the ability to see relations in, make generalization from, and relate and organize ideas represented in a symbolic form. In machine perception MU the ability of a machine to see is defined as the ability to solve perceptual problems by application of the notion of the perceptual transformation. In order to test the understanding ability of the machine (SUS) to solve the different perceptual problems, at any perceptual level, the visual intelligence tests that are used to test the general intelligence, were employed. The visual intelligence tests are usually given in the form of the diagrammatic representation that refers to one of the ontological line-drawing picture categories. The following categories of visual intelligence tests were introduced: the VA category of the visual analogy text-tasks TVQ , the category of the ‘odd one out’ M text-tasks, the category of the matrix text-tasks TVQ and the category of ‘what WN comes next’ text-tasks TVQ . Each test T 2 TVQ is given by the objects o1 ; . . .; om , the representatives of the test T and usually the set of answer objects m1 ; . . .; mn . Each test T 2 TVQ is characterized by the shape class to which all objects o1 ; . . .; om belong, where objects are the visual representations of the test. If all objects o1 ; . . .; om are members of the shape class C, o1 ; . . .; om 2 C the test is denoted as T½C. For example, if all objects o1 ; . . .; om are members of the occlusion class, the test will be denoted as T½?. However, not all objects m1 ; . . .; mn need to be members of the occlusion class. The type of the task and the type of the visual representation of the test determine the way of finding of the solution by selecting the proper PS-perceptual transformation. However, generating of the visual objects o1 ; . . .; om , the representatives of the test, is based on application of the GO-perceptual transformation that depends on the shape class to which objects o1 ; . . .; om belong. The higher level interpretation is based on the notation of the visual concept. A concept is related to categorization and categories that play important role in all aspects of cognition. Categorization implies that objects are grouped into

34

2

Machine Perception—Machine Perception MU

categories, usually for some specific purpose. Categories are regarded as mental representations, abstract objects or abilities that make up the fundamental building blocks of thoughts and beliefs. A concept is often understood as an abstraction standing for a set of objects sharing some properties that differentiate them from other concepts [67]. While no serious problems are posed by concepts such as ‘dog’ or ‘prime number,’ any attempt to define precisely what is meant by a ‘despotic ruler’ will turn out to be rather difficult because this concept is subjective and context-dependent. Other concepts, such as ‘philosophy’ or ‘galaxy,’ have fuzzy boundaries. Even when concepts can be defined precisely (e.g. leukaemia), a correct classification of an object (e.g. a patient) based on the available data may present a difficult problem. In the real world, concepts are never isolated. Concepts can be organized into a hierarchy, higher levels of which are termed “superordinate” and lower levels termed “subordinate”. The notion of category and concept is vital to machine perception MU research as the important ingredients of thinking and understanding process. In this book, the term “concept” is used to represent the internal representation of the perceived object, whereas the category is regarded as the model of the object that is part of the hierarchical structure of categories. Both, the visual concept and the visual category have the very similar structures and consist of the symbolic names. The visual concept is always concerned with only one particular instance of the category, whereas the visual category, at the given categorical level, is the result of learning from many images that are representatives of this category. The visual concept is related to visual object categories, established based on the existing knowledge. During perception, based on the sensory data, the concept is formed and used in problem solving. The visual category, however, can be thought of as the visual concept that is stored in memory and is the part of the complex hierarchical categorical structure. The prototype of each category consists of parts for each main symbolic ingredient such as the 3D object symbolic name, the symbolic name of the main 2D aspects or the symbolic name of the parts (2D representation). Visual categories, organized into a hierarchy, can be used for making generalization. At a given level of the hierarchy, categories are usually disjoint, but sometimes they can overlap; the difference between them may be small or large. As it was described in previous sections, in machine perception the representation of the visual knowledge was the topic of research concerning of the learning and recognizing objects or scenes. The concept is often regarded as an internal representation of the 3D object suitable for matching its feature to image description and there are many methods of object representations such as the surface-based object representation, the graph representation, or generalized cylinder (see reference in Sect. 2.2.3). Some of these concepts, such as internal representation of the 3D object, can be suitable for solving the specific visual problems. However, within the framework of the machine understanding MU, they are, in general, not well suited for solving the perceptual problems that are related to the visual understanding. Introduced by the authors the notation of the visual concept is a part of the hierarchical categorical structure and is well fitted into the framework of the image understanding.

2.4 Machine Perception MU

35

The categories and categorization are very important elements of the visual object recognition and image (scene) understanding. In machine perception MU, the visual understanding usually refers to the different categories of visual objects. In machine understanding, categories of visual objects are established based on the assumption that a visual object exists and can be perceived by the accessible technical tools. Categories of visual objects supply knowledge about the visual aspects of the world. The notation of categories is based on a categorical chain [138]. This visual context given by the hierarchical categorical structure of the learned knowledge becomes the nucleus of the visual understanding process that can trigger the reasoning or the imaginary generative process. By influencing further expectations that are spawned from this contextual knowledge it can guide the attention that will be focused only on some image details. Naming that assigns the perceived object into one of the ontological categories is placing the object inside the universe of the meaningful categorical structure of the visual objects categories and the categorical structure of the ontological picture classes, supplies contextual knowledge of the scale, placement and the relational dependence among objects. The interpretation of the perceived object at the low-perceptual level does not need the learned knowledge, the interpretation of the perceived object at the middle-perceptual level often needs the small part of learned knowledge, whereas the interpretation at the higher-perceptual level requires usually the relatively big amount of the learned knowledge that is acquired during the complex learning process. One of the most important machine perception MU problems is a problem of learning the concept of the selected category at the selected categorical level and forming the concept of the selected category at the selected categorical level. The question, how to extract essential knowledge and represent it in a formal way, such that a machine can learn a concept of a category (class), identify objects or discriminate between them, has intrigued and provoked many researchers. The growing interest led to the establishment of the areas of pattern recognition, machine learning and artificial intelligence. Researchers in these disciplines try to provide mathematical foundations and develop models and methods that automate the process of recognition by learning from a set of examples. This attempt is inspired by the human ability to recognize, for example, what a tree is, given just a few examples of trees. The idea is that a few examples of objects (and possible relations between them) might be sufficient for extracting suitable knowledge to characterize their class. However, in machine perception MU, learning is based on the different assumption. Learning knowledge of the visual object is strictly connected with learning to understand and to imagine an object. Understanding of the perceived object (object which was recognized) requires to be able to solve the problem of understanding of its three-dimensional shape, describing (imaging) how the object would look if seen from a different viewpoint, describing the functions or using of a familiar object category, finding other objects with which it may be associated (e.g. screwdriver—screw) or labelling objects with a proper name (naming). Learning knowledge of the selected category includes learning of the visual knowledge and the non-visual knowledge. Knowledge of the specific ontological category is learned at the prototype level     vSig      v.

36

2

Machine Perception—Machine Perception MU

The prototype is defined during learning process at the level for which the training exemplars are available and is represented by all visual representatives of the visual domain prototype. The visual domain prototype refers to visual knowledge that makes it possible to recognize all visual representatives of the prototype. The human visual system has a highly developed capability for interpretation of the visual data detecting many classes of patterns (shapes). Perception consists in fitting the stimulus material with templates of relatively simple shapes (visual concepts) and in perception of shape lie the beginnings of concept formation. Machine perception MU, similarly like human perception, is operating on fitting the stimulus material with templates of relatively simple shapes, members of shape classes. Interpretation by applying the IN-perceptual transformation transforms the perceived object into a symbolic name that represents one of the shape classes. Shape classes were introduced by Les [147] as the basic visual categories used during the visual thinking and visual reasoning process. The description of the classes refers to the visual objects such as geometrical figures. For example, the convex polygon class consists of the elements (shapes) that are called the convex polygons, and is denoted as Ln , where n refers to a number of polygon’s sides. Shape classes are defined based on the notation of the perceptual operator and application of the CF-perceptual transformation, which is used to derive the specific classes during learning of the visual knowledge. The natural extension of the shape classes, represented by the symbolic names, is the 3D object classes, also represented by the symbolic names. The 3D object classes play the similar role in learning of the visual concept of the 3D objects as the shape classes in learning of the visual concept of the 2D objects. The 3D object classes, similarly like the 2D shape classes, are established based on general object attributes such as homotopy, convexity, thickness. The 3D object class is the class, members of which are 3D objects. The 3D object, real world object or ideal geometrical solid, is usually represented by many visual aspects that are 2D objects (images). The 3D object classes refer to the 3D objects such as geometrical solids and similarly like the shape classes, the convex solid class with n-faces is denoted by Dn . The 3D object class is also given in the normal form as @½D½g1 ; g22 ; g33 , or in the operator form as g1 @g22 @g33 , where g1 ; g22 ; g33 are symbolic names of visual aspects of the multiview representation and @ is the multiview operator. The 3D concave class or the 3D cyclic class are following the shape class convention and are represented by the notation of the 3D concave class or the 3D cyclic class. Shape classes, 3D object classes and picture classes play an important role in machine perception MU. Picture classes reflect the important perceptual property that shows that perception is related to the perceptual visual field, where the object is perceived as an image. In this book, an image is the term that is denoting the perceived object (image), whereas the term picture is used in the context of the picture classes. The picture classes are introduced to fit the structural properties of image into one of the picture classes during solving the interpretational problem. The significant arrangements of image elements and spatial structure of the image make it possible to group images based on assumed similarity criterion that form

2.4 Machine Perception MU

37

the picture classes. The picture classes, introduced based on the concept of the generic class of pictures [148], contrary to objects categorization [149], are introduced to supply the visual knowledge representation framework, based on hierarchical structure of the shape categories. The picture classes are divided into two categories: the structural and the ontological picture classes. The structural picture classes refer to the structural organization of the picture plane that in visual art is often called the picture composition. Being in the world, imposes the perceptual framework in which all perceptual information is projected into the normalized perceptual field (image). For this reason, it is assumed that an image is the basic perceptual element and all visual data are perceived as the image. The picture classes are basic contextual categories used by the perceptual interpretational mechanism to form the prior expectations. Without the constraining influence of prior expectations, many perceptual problems would be under constrained to the extent that they could never be solved. In machine perception MU, the knowledge that is needed during solving perceptual problems is supplied by the contextual information that is coded in the categorical structure of the shape classes, the 3D object classes, the picture classes and perceptual and ontological categories of visual objects. Perceptual transformations play an important role in machine perception MU, both, during interpretation of the image and solving a visual problem. Finding the solution to a visual problem requires, among others, identifying the visual objects, comparing two or more objects and identifying transformations responsible for generation of these objects Transformations may include scaling, rotations, and reflections in order to distinguishing between a big triangle and a small triangle, recognizing that one object is a rotation of another, and recognizing that one object is a reflection of another. In machine perception MU the transformations that are applied to transform perceived object (image) during solving perceptual problems are called the perceptual transformations. The perceptual transformations, that are part of the interpretational mechanisms and mechanism of solving visual perceptual problems, are based on application of the shape classes and the 3D object classes. The perceptual transformation was introduced by authors [150] for the purpose of solving of the perceptual problems (visual intelligence tests) and applied to test the perceptual ability of a machine. The perceptual transformation that was introduced in [150] was regarded as the special geometrical transformation that was applied in the domain of the geometrical figures. The perceptual transformations are divided into the PS-perceptual transformation, the IN-perceptual transformation, the CF-perceptual transformation, the GR-perceptual transformation and the GO-perceptual transformation. The PS-perceptual transformations solve the different perceptual problems and are divided into three groups, the geometrical PS-perceptual transformation, the arithmetical PS-perceptual transformation and the symbolic PS-perceptual transformation. The geometrical PS-perceptual transformation does not interpret the object and is based on the application of the processing transformation I3 ¼ WðI1 ; I2 Þ that transforms two images (objects) I1 and I2 into the image I3 according to the processing rules (geometrical generic form) given

38

2

Machine Perception—Machine Perception MU

by the general geometric operator . The arithmetical PS-perceptual transformation interprets the object in terms of numbers and is based on application of the arithmetical perceptual operator, such as the addition operator +. The symbolic PS-perceptual transformation, at first, transforms the object into the symbolic form (symbolic name) and next solves the problem. The IN-perceptual transformation spontaneously interprets the perceived image I B as a set of objects or as a complex object by transforming it to the general symbolic name  gi 2 fg; ½Dg; ½Pg; ng at the low- or middle-perceptual level and next, at the higher-perceptual level, into the name of one of the ontological categories. The spontaneous interpretation of the image is the basic perceptual task (perceptual problem) that can be regarded as the seeing the image or understanding the perceived image. Seeing means understanding and it makes it possible to describe an image in terms of the real world object or scene, to compute a number of objects and attach names to these objects. The CF-perceptual transformations are divided into the basic class forming perceptual transformation and the complex class forming perceptual transformation. The basic CF-perceptual transformation is perceptual transformation that is used to define members of the basic shape classes. The CF-perceptual transformation is used to define the members of the complex shape classes such as the transparent or occlusion class. The CF-perceptual transformation is very similar to the GO-perceptual transformation that is used to generate objects, the sequence of objects or the set of objects. In machine understanding MU the perceived visual data are formed as an image. Both, real world objects or scenes and images (photographs) of the real world objects have the same perceptual representations as the perceived image. The perceived object is seen in the frame of the rectangular normalized image that becomes the frame of reference. During interpretation of the perceived image, the image is, at first, assigned into one of the perceptual categories and next into one of the structural and ontological picture classes. At the final stage, the image or object extracted from the image is assigned to one of the ontological object categories and used in solving complex visual problems. The image is assigned to the structural and ontological picture class by application of the IN-perceptual transformation that utilizes one or more than one processing transformations. In this book processing transformations are presented in the visual form rather than using pseudo mathematical notation that in many computer vision literatures obscures the clarity of the presented material. The reasoning process is also presented in the new form by showing only the stages of application of the perceptual transformations (processing transformations) and visual illustration of the reasoning process. The material is presented by describing, at first, the basic perceptual transformations, next the CT-perceptual transformations and their applications in assigning an image to one of the perceptual categories during the visual reasoning process. Assigning the perceived object from one of the line-drawing perceptual categories into the picture classes such as Pnh ½rm  or Pnh ½Xm , or Pnh ½Dm , is based on application of the middle-level and higher-level IN-perceptual transformations and the complex reasoning process. Each stage of reasoning can be represented as IRi ¼ Wi ðI i Þ:

2.4 Machine Perception MU

39

hðIRi Þ  a ) I B . Pi , where I i , IRi are images before and after application of the processing transformation Wi , where hðIRi Þ  a is a required condition that needs to be fulfilled, and I B . Pi is the assigning operation that assigns the image I B to the picture class Pi . The reasoning process can be seen as the hypothesis formulation based on the all knowledge of the previous stages of processing and hypothesis testing during the given stage of reasoning.

References 1. Nevatia R (1982) Machine perception Prentice-Hall. Englewood Cliffs, NJ 2. Roberts LG (1963) Machine perception of three-dimensional solids. In: Tippett JT et al (eds) Optical and electro-optical information processing. MIT Press, Cambridge, pp 159–197 3. Hochberg J, Brooks V (1962) Pictorial recognition as an unlearned ability: a study of one child’s performance. Am J Psychol 75(4):624–628 4. Guzman A (1968) Computer recognition of three dimensional objects in a visual scene. Massachusetts Institute of Technology, Ph.D. thesis, Cambridge, MA 5. Clowes MB (1971) On seeing things. Artif Intell 2:79–116 6. Huffman DA (1971) Impossible objects as nonsense sentences. Mach Intell 6:295–323 7. Waltz D (1975) Understanding line drawings of a scene with shadows. In: Winston PH (ed) The psychology of computer vision. McGraw-Hill, New York 8. Shimshoni I, Ponce J (1994) Recovering the shape of polyhedra using line-drawing analysis and complex reflectance models. In: CVPR, pp 514-519 9. Grimstead IJ (1997) Interactive sketch input of boundary representation solid models. Cardiff University 10. Varley PAC, Martin RR (2000) A system for constructing boundary representation solid models from a two-dimensional sketch. In: GMP. IEEE Press 11. Varley PAC, Martin RR (2001) The junction catalogue for labelling line drawing of polyhedra with tetrahedral vertices. Int J Shape Modell 7(1):23–44 12. Mackworth AK (1973) Interpreting pictures of polyhedral scenes. Artif Intell 4:121–137 13. Draper SW (1980) Reasoning about depth in line-drawing interpretation. Sussex University 14. Parodi P, Lancewicki R, Vijh A, Tsotsos JK (1998) Empirically-derived estimates of the complexity of labelling line drawing of polyhedral scenes. Artif Intell 105:47–75 15. Sugihara K (1986) Machine interpretation of line drawings. MIT Press, Cambridge 16. Marr D, Nishihara K (1978) Representation and recognition of the spatial organization of three dimensional shapes. Proc R Soc Lond Ser B 200:269–294 17. Humphreys GW, Quinlan PT (eds) (1987) Normal and pathological processes in visual object constancy. Visual object processing: a cognitive neuropsychological approach. Erlbaum, London 18. Rock I (1973) Orientation and form. Academic Press, New York 19. Ratcliff G, Newcombe F (1982) Object recognition: some deductions from the clinical evidence. In: Ellis AW (ed) Normality and pathology in cognitive functions. Academic Press, London 20. Riddoch MJ, Humphreys GW (1987) A case of integrative visual agnosia. Brain 110:1431– 1462 21. Warren CEJ, Morton J (1982) The effects of priming on picture naming. Br J Psychol 73:117–130 22. Fischler M, Elschlager R (1973) The representation and matching of pictorial structures. IEEE Trans Comput 100(22):67–92 23. Kanade T (1977) Computer recognition of human faces. Birkhauser, Basel

40

2

Machine Perception—Machine Perception MU

24. Yuille A (1991) Deformable templates for face recognition. J Cogn Neurosci 3(1):59–70 25. Moghaddam B, Pentland A (1997) Probabilistic visual learning for object representation. IEEE Trans Pattern Anal Mach Vis 19(7):696–710 26. Heisele B, Ho P, Wu J, Poggio T (2003) Face recognition: component-based versus global approaches. Comput Vis Image Underst 91(1–2):6–21 27. Heisele B, Serre T, Poggio T (2007) A component-based framework for face detection and identification. Int J Comput Vis 74(2):167–181 28. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. Paper presented at the IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), Anchorage, AK 29. Fergus R (2007b) Part-based models, (CVPR 2008) 30. Fergus R (2009) Classical methods for object recognition. ICCV, Kyoto, Japan 31. Felzenszwalb PF, Huttenlocher DP (2006) Efficient belief propagation for early vision. Int J Comput Vis 70(1):41–54 32. Oliva A, Torralba A (2005) Building the gist of a scene: the role of global image features in recognition. Prog Brain Res 155:23–36 33. Divvala SK, Hoiem DW, Hays JH, Efros AA, Hebert M (2009) An empirical study of context in object detection. In: IEEE computer society conference on computer vision and pattern recognition workshop (CVPR), Miami, FL. IEEE Computer Society, pp 1271–1278 34. Torralba A, Freeman WT, Fergus R (2008) 80 million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30 (11):1958–1970 35. Lazebnik S, Schmid Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognition natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). IEEE Computer Society, pp 2169–2178 36. Russell BC, Torralba A, Liu C, Fergus R, Freeman WT (2007) Object recognition by scene alignment. Paper presented at the conference on neural information processing systems (NIPS) 37. Malisiewicz T, Efros AA (2008) Recognition by association via learning per-exemplar distances. CVPR 38. Murphy K, Torralba A, Freeman W (2003) Using the forest to see the tree: a graphical model relating features, objects and the scenes. In: Conference on neural information processing systems (NIPS) 39. Torralba A (2003) Contextual priming for object detection. Int J Comput Vis (IJCV) 53 (2):169–191 40. Torralba A, Murphy K, Freeman W, Rubin M (2003) Context-based vision system for place and object recognition. Paper presented at the international conference on computer vision (ICCV 2003). Nice, France 41. Verbeek J, Triggs B (2008) Scene segmentation with CRFs learned from partially labelled images. In: Conference on neural information processing systems (NIPS) 42. Palmer S (1975) Visual perception and world knowledge: notes on a model of sensory-cognitive interaction. In: Norman D, Rumelhart D (ed) Explorations in cognition. Freeman, San Francisco, pp 279–307 43. Sinha P, Torralba A (2002) Detecting faces in impoverished images. J Vis 2(7):601 44. Carbonetto P, de Freitas N, Barnard K (2004) A statistical model for general contextual object recognition. In: European conference on computer vision (ECCV), 2004, pp 350–362 45. Fink M, Perona P (2003) Mutual boosting for contextual inference. In: Conference on neural information processing systems (NIPS) 46. He X, Zemel RS, Carreira-Perpinan MA (2004) Multiscale conditional random fields for image labelling. In: Conference on computer vision and pattern recognition (CVPR), 2004. pp 695–702 47. Kruppa H, Schiele B (2003) Using local context to improve face detection. In: British machine vision conference (BMVC)

References

41

48. Shotton J, Winn J, Rother C, Criminisi A (2007) Text on boost for image understanding: Multi-class object recognition and segmentation by jointly modeling appearance, shape and context. Int J Comput Vis (IJCV) 49. Kumar S, Hebert M (2005) A hierarchical field framework for unified context-based classification. In: International conference on computer vision (ICCV) 2005, pp 1284–1291 50. Lipson P, Grimson E, Sinha P (1997) Configuration based scene classification and image indexing. In: Conference on computer vision and pattern recognition (CVPR), 1997, p 1007 51. Singhal A, Luo J, Zhu W (2003) Probabilistic spatial context models for scene content understanding. In: Conference on computer vision and pattern recognition (CVPR), 2003, p 235 52. Galleguillos C, Rabinovich A, Belongie S (2008) Object categorization using cooccurrence, location and appearance. Paper presented at the conference on computer vision and pattern recognition (CVPR) 53. Hanson A, Riseman E (1978) Visions: a computer system for interpreting scenes. In: Hanson A, Riseman E (eds) Computer vision systems. Academic Press, New York, pp 303–333 54. Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: Proceedings of the international conference on computer vision (ICCV) 55. Strat T, Fischler M (1991) Context-based vision: recognizing objects using information from both 2-d and 3-d imagery. Pattern Anal Mach Vis 13(10):1050–1065 56. Torralba A, Murphy KP, Freeman WT (2004) Contextual models for object detection using boosted random fields. Paper presented at the conference on neural information processing systems (NIPS) 57. Wolf L, Bileschi S (2006) A critical view of context. Int J Comput Vis (IJCV) 58. Tsotsos JK, Mylopoulos J, Covvey HD, Zucker SW (1980) A framework for visual motion understanding. IEEE Trans Pattern Anal Mach Intell 2(6):563–573 59. Brooks R (1981) Symbolic reasoning among 3-dimensional models and 2-dimensional images. Artif Intell 17:285–349 60. Draper BA, Collins RT, Brolio J, Hanson AR, Riseman EM (1989) The schema system. Int J Comput Vis 2:209–250 61. Matsuyama T, Hwang V (1990) SIGMA: a knowledge-based aerial image understanding system. Plenum Press, New York 62. Dillon C, Caelli T (1997) Cite—scene understanding and object recognition. In: Caelli T, Bishof WF (eds) Machine learning and scene interpretation. Plenum Publishing Cooperation, New York, pp 119–187 63. Dance S, Caelli T (1997) Soo-pin: picture interpretation networks. In: Caelli T, Bishof WF (eds) Advances in computer vision and machine intelligence. Plenum Publishing Cooperation, New York, pp 225–254 64. Salzbrunn R (1997) Wissensbasierte Erkennung und Lokarisierung von objekten. Shaker Verlag, Aachen 65. Glasgow J, Narayanan NH, Chandrasekaran B (eds) (1995) Diagrammatic reasoning: cognitive and computational perspectives. The MIT Press, Cambridge 66. Anderson M, Meyer B, Olivier P (eds) (2002) Diagrammatic representation and reasoning. Springer, London 67. Michalski RS (1983) A theory and methodology of inductive learning. In: Michalski RS, Carbonell JG, Mitchel TM (eds) Machine learning: an artificial intelligence approach. Tioga Publishing, Palo Alto, pp 83–133 68. Rosch E, Braem PB (1976) Basic objects in natural categories. Cogn Psychol 8:382–439 69. Behl-Chadha G (1996) Basic-level and superordinate-like categorical representations in early infancy. Cognition 60:105–141 70. Markman EM (1989) Categorization and naming in children: problems of induction. MIT Press, Cambridge 71. Wasserman EA, Astley SL (1994) A behavioral analysis of concepts: Its application to pigeons and children. In: Medin DL (ed) Psychology of learning and motivation. Academic Press, San Diego, pp 73–132

42

2

Machine Perception—Machine Perception MU

72. Gauthier I, Tarr MJ, Moylan J, Skudlarski P, Gore JC, Anderson AW (2000) The fusiform “face area” is part of a network that processes faces at the individual level. J Cogn Neurosci 12:495–504 73. Jolicoeur P, Landau MJ (1984) Effects of orientation on the identification of simple visual patterns. Can J Psychol 38:80–93 74. Farah MJ, Tanaka JW, Drain HM (1995) What causes the face inversion effect? J Exp Psychol: Hum Percept Perform 21:628–634 75. Marsolek CJ (1999) Dissociable neural subsystems underlie abstract and specific object recognition. Psychol Sci 10(2):111–118 76. Op de Beeck H, Béatse E, Wagemans J, Sunaert S, Van Hecke P (2000) The representation of shape in the context of visual object categorization tasks. Neuroimage 12:28–40 77. Palmer SE (1975) The effects of contextual scenes on the identification of objects. Memory and cognition 78. Bar M, Ullman S (1993) Spatial context in recognition. Perception 25:343–352 79. Parikh D, Zitnick C, Chen T (2008) From appearance to context-based recognition: Dense labelling in small images. Paper presented at the conference on computer vision and pattern recognition (CVPR) 80. Requicha AG (1980) Representations for rigid solids: theory, methods, and systems. ACM Comput Surv 12(4):437–464 81. Badler N, Bajscy R (1978) Three-dimensional representation for computer graphics and vision. ACM SIGGRAPH Comput Graph 82. Baumgart BG (1972) Winged edge polyhedron representation. Technical report. Stanford University 83. Coons SA (1974) Surface patches and B-spline Curves. In: Computer aided geometric design. Academic Press, pp 1–16 84. Koenderink J, van Doorn A (1986) Dynamic shape. Biol Cybern 53:383–396 85. Biederman I (1987) Recognition by components: a theory of human image understanding. Psychol Rev 94:115–147 86. Palmer S, Nelson R (2000) Late influences on perceptual grouping: illusory figures. Percept Psychophys 62(7):1321–1331 87. Chakravarty I, Freeman H (1982) Characteristic view as a basis for three-dimensional object recognition. In: Conference of international society for optical engineering 1982 88. Winston P (1975) Learning structural descriptions from examples. In: Winston PH (ed) Psychology of computer vision. McGraw Hill, New York 89. Connell J, Brady M (1987) Generating and generalizing models of visual objects. Artif Intell 3:159–183 90. Dong G, Yanaguchi T, Yagi Y, Yachida M (1994) Shape concept learning from examples and explanation. Comput Vis 87–8(8):57–64 91. Haar RL (1982) Sketching: estimating object positions from relational descriptions. Comput Graph Image Process 19:227–247 92. Ueda N, Suzuki S (1993) Leaning visual modes from shape contours using multiscale convex/concave structure matching. IEEE Trans Pattern Anal Mach Intellig PAMI-15 (4):337–352 93. Arnheim R (1970) Visual thinking. Faber and Faber, London 94. Kellman P, Shipley T (1990) Visual interpolation in object perception: a computational theory (manuscript). New York 95. Briscoe RE (2011) Mental imagery and the varieties of a modal perception. Pac Philos Q 96. Wertheimer M (1923) Untersuchungen zur Lehre von der Gestalt. Psychologische Forschung 4:301–350 97. Koffka K (1962) Principles of gestalt psychology. Routledge & Kegan Paul, London 98. Kanizsa C (1979) Organization in vision. New York 99. Kanizsa G (1979) Organization in vision. Essays on gestalt perception. Praeger, New York 100. Kanizsa G (1985) Seeing and thinking. Acta Psychol 59:23–33

References

43

101. Kellman PJ, Shipley TF (1991) A theory of visual interpolation in object perception. Cogn Psychol 23:141–221 102. Boselie F, Wouterlood D (1992) A critical discussion of Kellman and Shipley’s (1991) theory of occlusion phenomena. Psychol Res 54:278–285 103. Wouterlood D, Boselie F (1992) A good-continuation model of some occlusion phenomena. Psychol Res 54:267–277 104. Hochberg J, McAlister E (1953) A quantitative approach to figural ‘goodness’. J Exp Psychol 46:361–364 105. Leeuwenberg E (1969) Quantitative specification of information in sequential patterns. Psychol Rev 16:216–220 106. Leeuwenberg E (1971) A perceptual coding language for visual and auditory patterns. Psychol Rev 76:307–349 107. Buffart H, Leeuwenberg E, Restle F (1981) Coding theory of visual pattern completion. J Exp Psychol Hunan Percept Perform 7:241–274 108. Buffart H, Leeuwenberg E, Restlem F (1983) Analysis of ambiguity in visual pattern completion. J Exp Psychol Hunan Percept Perform 9:980–1000 109. Sekuler AB (1994) Local and global minima in visual completion: effects of symmetry and orientation. Perception 23:529–545 110. van Lier RL, van der Helm P (1995a) Multiple completions primed by occlusion patterns. Perception 24:727–740 111. van Lier R, Wagemans J (1999) From images to objects: global and local completions of self-occluded parts. J Exp Psychol Hum Percept Perform 25:1721–1741 112. Rock I (1983) The logic of perception. The MIT Press, Cambridge 113. Boselie F (1988) Local versus global minima in visual pattern completion. Percept Psychophys 43:431–445 114. Van Lier R, Van der Helm, Leeuwenberg E (1994) Integrating global and local aspects of visual occlusion. Perception 23:883–903 115. Lier RHP, Leeuwenberg E (1995) ) Competing global and local completions in visual occlusion. J Experiential Psychol Hum Percept Perform 21:571–583 116. Shipley TF, Kellman PJ (1992) Perception of partly occluded objects and illusory figures: evidence for an identity hypothesis. J Experimental Psychol Hum Percept Perform 18:106– 120 117. Yin C, Kellman PJ, Shipley TF (1997) Surface completion complements boundary interpolation in the visual integration of partly occluded objects. Perception 26:1459–1479 118. Rubin N (2001) The role of junctions in surface completion and contour matching. Perception 30:339–366 119. de Wit CJ, van Lier RJ (2002) Global visual completion of quasi-regular shapes. Perception 31:969–984 120. Albert MK (2007) Occlusion, transparency, and lightness. Vis Res 47:3061–3069 121. Anderson B, Winawer J (2005) Image segmentation and lightness perception. Nature 434:79–83 122. Adelson EH (1993) Perceptual organization and the judgement of brightness. Science 262:2042–2044 123. Adelson EH (2000) Lightness perception and lightness illusions. In: Gazzaniga M (ed) The new cognitive neuroscience. MIT Press, Cambridge, pp 339–351 124. Metelli F (1974) The perception of transparency. Sci Am 230:90–98 125. Arnheim R (1974) Art and visual perception. A psychology of the creative eye. University of California Press, Berkeley 126. Schumann F (1904) Einige Beobachtungen uber die Zusammenfassung von Gesichtseindriicken zu Einheiten. Psychologische Studien 1:1–3 127. Kanizsa G (1955) Marzini quasi-percettivi in campi con stimolazione omogenea. Riv di psicologia 49:7–30 128. Julesz B (1964) Binocular depth perception without familiarity cues. Science 145:356–362 129. Coren S (1972) Subjective contours and apparent depth. Psychol Rev 79(4):3S9–367

44

2

Machine Perception—Machine Perception MU

130. Parks TE (1984) Illusory figures: a (mostly) theoretical review. Psychol Bull 95:282–300 131. Pritchard WS, Warm JS (1983) Attentional processing and the subjective contour illusion. J Exp Psychol 112:145–175 132. Grossberg S, Mingolla E (1985) Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading. Psychol Rev 92:173–211 133. Marr D (1982) Vision: a computational investigation into the human representation and processing of visual information. W. H. Freeman, San Francisco 134. Saund E (1999) Perceptual organization of occluding contours of opaque surfaces. Comput Vis Image Underst 76(1):70–82 135. Kogo N, Drozdzewska A, Zaenen P, Alp N, Wagemans J (2014) Depth perception of illusory surfaces. Vis Res 96:58–64 136. Kogo N, Strecha C, Van Gool L, Wagemans J (2010) Surface construction by a 2-D differentiation–integration process: a neurocomputational model for perceived border ownership, depth, and lightness in Kanizsa figures. Psychol Rev 117(2):406–439 137. Arnheim R (1954) Art and visual perception: a psychology of the creative eye 138. Les Z, Les M (2008) Shape understanding system. The first steps toward the visual thinking machines. Stud Comput Intell. Springer, Berlin 139. Les Z, Les M (2013) Shape understanding system—knowledge implementation and learning. Studies in computational intelligence. Springer, Berlin 140. Les Z, Les M (2015) Shape understanding system—machine understanding and human understanding. Studies in computational intelligence. Springer, Berlin 141. Koffka K (1935) Principles of gestalt psychology. Harcourt Brace, Jovanovic 142. Weisberg R, Alba J (1982) Problem solving is not like perception: more on Gestalt theory. J Exp Psychol 111(3):326–330 143. Schwering A, Krumnack U, Kühnberger KW, Gust H (2007) Using gestalt principles to compute analogies of geometric figures. In: 2007. Nashville, TN 144. Lowe D (1985) Perceptual organization and visual recognition. Kluwer Academic, Dordrecht 145. Krizhevsky A, Sutskever I, Hinton GE (2012) Image net classification with deep convolutional neural networks. Paper presented at the NIPS 146. Warrington EK, Taylor AM (1978) Two categorical stages of object recognition. Perception 7(6):695–705 147. Les Z (2001) Shape understanding. Possible classes of shapes. Int J Shape Modell 7(1): 75–109 148. Les Z (1996) An aesthetic evaluation method based on image understanding approach. In: The first international conference on visual information systems VISUAL’96, Melbourne, 5–6 Feb 1996. VUT, pp 317–327 149. Biederman I, Mezzanotte RJ, Rabinowitz JC (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14(2):143–177 150. Les Z, Les M (2018) Machine understanding—testing visual understanding ability of machine: the visual intelligence test. Int J Underst 7

Chapter 3

Machine Perception MU—Shape Classes

3.1

Introduction

Primary objective of machine perception MU is to construct the symbolic description of the visual content of an image and using this symbolic representation to solve the perceptual problems such as interpretation of perceived images. Symbolically represented visual knowledge provides a level of abstraction at which two otherwise dissimilar domains may look more alike. For example, the concepts of a planet and a ball are quite different, but if both are represented as a circle, it may facilitate analogical retrieval, mapping and transfer. The problem of perception and interpretation of images by application of the IN-perceptual transformation (described in Chap. 6), in order to find the solution to a perceptual problem, is solved within the framework of machine understanding. The machine understanding framework is referring to the human visual system that has a highly developed capability for interpretation of the visual data and detecting many classes of patterns based on statistically significant arrangements of image elements. These classes of patterns and statistically significant arrangements of image elements are called shapes. In perception of shape that is the grasping of structural features found in, or imposed upon, the stimulus material lie the beginnings of concept formation [1]. Perception consists in fitting the stimulus material with templates of relatively simple shapes which are called the visual concepts. The simplicity of these visual concepts is relative, in that a complex stimulus pattern viewed by refined vision may produce a rather intricate shape, which is the simplest attainable under the circumstances. Perception in machine perception MU, similarly like human perception, is operating on fitting the perceived object with templates of relatively simple shapes which are called the shape classes. In comparison to human perception, in machine perception MU the object, extracted from an image, is fitted with the relatively simple shapes that can be thought of as the basic perceptual categories, whereas the visual concept is a set of symbolic names representing the shape classes and is applied to learn the visual object categories. In this Chapter, the © Springer Nature Switzerland AG 2020 Z. Les and M. Les, Machine Understanding, Studies in Computational Intelligence 842, https://doi.org/10.1007/978-3-030-24070-7_3

45

46

3 Machine Perception MU—Shape Classes

Fig. 3.1 Examples of objects from the rectangular shape classes

shape classes, introduced by Les [2], as the basic perceptual categories used during the visual thinking and visual reasoning process, are presented. Previously defined shape classes were represented in both, the normal and operator form, by utilizing the notion of the perceptual operator. The shape classes can be also defined during the derivation process by application of the CF-perceptual transformation, described in Chap. 6. The shape classes, represented by the symbolic names, are used to study the perceptual processes in the context of perception, meaning of which is pointing to the meaning of perception as part of the visual thinking process. The description of the class refers to the visual objects such as geometrical figures. For example, the convex polygon class consists of the elements (shapes) that are called the convex polygons; the class is denoted as Ln , where n refers to a number of polygon’s sides. The detailed descriptions of the shape classes such as the thin class, the convex polygon class, the curve polygon class, and the cyclic class are given in Refs. [2–8]. In order to overcome problems connected with the notation of the shape classes, in the case when the objects are not well visible, the notation based on the rectangular shape classes are introduced by the authors. These classes are defined in the context of the natural bounding box. Examples of objects from these classes are shown in Fig. 3.1. In this book, however, these classes are not presented. Previously defined shape classes that were presented in [2–8] comprise only a small subset of the shape classes and were represented mostly in the normal form. The operator form, introduced in [9], was used to define the perceptual transformations. Shape classes, described in this Chapter, are used to define the perceptual transformations and are represented both, in the normal and operator form. Selected classes are relatively simple to illustrate application of the perceptual transformations in solving the interpretational perceptual and visual problems. However, the perceptual transformations used to solve the different perceptual problems require the broad range of the well-defined shape classes.

3.2

Shape Classes—Perceptual Operators



The shape classes are defined by specifying the general placement operator  and general placement rules. The general placement operator  denotes the set of per^ R ^ ; ; ; [ g used to define the ceptual operators  ¼ f; ◊; ◊; ; q; 6; #; ?;^ ; ; placement rules derived from the generic form. These operators, introduced in [9],

3.2 Shape Classes—Perceptual Operators

47









were used to define the shape classes by specifying the placement rules. The general placement rules are given by the general placement operator  in the form X  x1 ; . . .; xn or X1  ::  Xn , where X; x1 ; . . .; xn ; X1  ::  Xn 2 K are members of the convex class K. The set of general placement operators  ¼ f; ◊; ◊; ; q; 6; ^ R ^ ; ; ; [ g presents only selected operators. Classes that need to be #; ?;^ ; ; defined to capture intricacies of all objects, used to design and solve the different perceptual problems, include other operators, not presented in this book. Based on visual similarities of objects, the following specific operators þ , , , @ were introduced. The specific operators are very useful during generalization when the proper perceptual transformation is to be selected. Each specific placement operator consists of sub-specific operators represented by subsets of the set R ^ ^ g, @ ¼ f ; [ g, or  ¼ f þ ; @; ; g, namely þ ¼ f; ◊; ◊; ; g,  ¼ f; ;  ¼ f6; ?g. The specific placement rules X1  ::  Xn can be given by the application of one of the specific operators such as þ , , @, or . For example, the specific placement rules can be given as X þ Y, X  Y or X  Y. The specific placement rules are employed during generalization or specialization process. Each symbol from the operator set  has its visual interpretation by pointing to the member of the class that it represents. For example, for the placement rule X1  X2 the placement operator has its visual representations shown in Fig. 3.2: the concave class X1  X2 (a), the class X1 ◊ X2 (b), the class X1 ◊ X2 (c), the cyclic class ^ ^ X2 (e), the class X1  X2 (f), the class X1 X2 (g), the class X1  X2 (d), the class X1  X1 ? X2 (h), Rthe class X1 6 X2 (i), the class X1 # X2 (j), the class X1 [ X2 (k), and the class X1 X2 (l). For the placement rule X1  X2 by selecting the specific operator  and assuming that objects are convex polygons X1 ; X2 2 Ln the specific shape classes are given as Ln þ 1  Ln or L2n  Ln . Assuming that objects X1 and X2 are members of the same class X the placement rule is given as X  X. In the case when X ¼ Ln the specific shape class is given as Ln  Ln . Figure 3.3a, b shows examples of members of the class Ln  Ln , where n = 3 and n = 5. For the specific operator þ that consists of a set of sub-specific operators þ ¼ f; ◊; ◊; ; g, the specific placement rule can be given in the form ½X þ X when both objects are members of the same class or ½X þ Y, when objects are from the different classes. In the case when X; Y 2 Ln , the specific class will be given as ½Ln  þ Ln , ½Ln þ 1  þ Ln or ½L2n  þ Ln . Figure 3.3c shows examples of members of the class ½Ln  þ Ln , where n = R3. For the specific operator @ that consists of a set of sub-specific operators @ ¼ f ; [ g, the specific placement rule can be given in the form X@X when both objects are members of the

Fig. 3.2 Different visual representations of the general placement operator

48

3 Machine Perception MU—Shape Classes

Fig.R 3.3 Examples of members of the classes Ln  Ln (a–b), ½L2n  þ Ln (c), Ln [ Ln (d), and Ln Ln (e)

Fig. 3.4 Examples of attribute operators



same class or X@Y when objects are from different classes. In the case when X; Y 2 Ln , the class can be given as Ln @Ln , Ln þ 1 @Ln or L2n @Ln . Similarly, the general placement rules can be defined for other specific operators. Figure 3.3 n n shows examples of members R n þ 1 of the class L [ L (d), where n = 3, n = 4 and n = 5 n (e), where n = 3, and n = 4 respectively. respectively and L L As it was described, the general placement rules are given by the general placement operator  in the form X  x1 ; . . .; xn or X1  ::  Xn , where X; x1 ; . . .; xn ; X1  ::  Xn 2 K, are members of the convex class K. The convex class K is the class, members of which cannot be produced by application of the perceptual operator. The convex class K will be called the elementary shape class. For preserving consistency of notation with the general form of the placement rule X1  ::  Xn the null operator j is introduced. This null operator j will make it possible to define the convex class in a similar way as other classes, by application of the placement rule X j X. Also, to derive the new specific classes, the attribute operators, such as the rotate perceptual operator that is acting on the convex object X rotating it by a given angle XaX (see Fig. 3.4a) or the perceptual attribute operator such as changing colour X j XðcÞ (see Fig. 3.4b), are introduced. For the complex objects such as cyclic objects, the rotation can be given by rotating an object, its parts or both the object and its parts. The classes, members of which are produced by application of the perceptual operator, will be called the non-elementary shape classes. In the case when n = 2, the generic form is given as X1  X2 and is used to define the broad range of shape classes. By selecting the operator from the operator set  and specifying X1 and X2 , the new shape classes can be derived by inserting into the generic form X1  X2 the ^ R ^ ; ; ; [ g. The placement symbol from the set  ¼ f; ◊; ◊; ; q; 6; #; ?;^ ; ; rule obtained, that is represented by the given placement operator, is verified by the visual constraints that are precisely specified by the algorithm or formula for exemplar generation (object of the given shape class). The specific class derived by

3.2 Shape Classes—Perceptual Operators

49

applying the generic placement rule is also represented by the normal form of shape classes that specify, in more detail, the attributes of the given specific class. However, the normal form does not have such a power of making generalization and for the composite classes the normal form is quite complex and not easy to use by a human subject. The basic shape classes, such as the concave or cyclic classes, are classes, members of which are the result of application of the placement rules ½X1   X2 ; . . .; Xn to two or more than two objects, members of the convex class X1 ; X2 ; . . .; Xn 2 K. For the perceptual operator , the placement rule has the form ½X1   X2 ; . . .; Xn and defines the n-cyclic classes. Figure 3.5 shows examples of the specific n-cyclic classes derived from the generic form ½X1   X2 ; . . .; Xn : ½L4R   2L3E (a–c), ½L4R   3L4R (d), ½KE1   2K 1 ; L3E (e), ½KE1   2L3E ; K 1 (f), ½L44   4L4R (g–i). For n = 3, the shape class is given by the placement rule ½X1   X2 ; X3 and the configuration indexes, and when X1 ¼ X2 ¼ X the placement rule is given as ½X1   2X. In the case when size of both objects is small the configuration indexes are related to the object ½X1  and usually are given by the set of nine numbers f0; . . .; 8g that indicate eight main directions and 0 as the middle of the object ½X1 . Objects shown in Fig. 3.5: ½L4R   L3E ð1Þ; L3E ð5Þ (a), ½L4R   L3E ð2Þ; L3E ð6Þ (b), ½L4R   L3E ð0Þ; L3E ð2Þ (c) are given to illustrate the meaning of the configuration indexes. In the case when a number of objects is growing the placement rules can be represented by introducing the new structural description. However, in many cases the configuration indexes can suffice to represent the selected shape classes. By applying the generic form Y1  ::  Yn , where Y1 ¼ X1  ::  Xm , …, Yn ¼ X1  ::  Xm , and where n [ 2 and m  n, the composite class Y1  . . .  Yn ¼ X1  ::  Xm  . . .  X1  ::  Xm can be obtained by specifying n, X1 ; . . .; Xm and the operator . For n = 2 and m = 2 the composite class can be obtained by using the generic form Y1  Y2 , and assuming that Y1 ¼ X3  X4 and Y2 ¼ X5  X6 . Then by inserting these formulas into the generic form Y1  Y2 , the composite class can be written as Y1  Y2 ¼ ðX3  X4 Þ  ðX5  X6 Þ. For the specific operators such as þ , , , @ the specific placement rules can be given in one of the following forms: ðX3  X4 Þ  ðX3  X4 Þ, ðX3 þ X4 Þ  ðX3 þ X4 Þ, ðX3  X4 Þ  ðX3 þ X4 Þ or ðX3  X4 Þ  ðX3 @ X4 Þ. The objects X1 ; X2 can be members of any shape class that is constrained only by the visual interpretation of the given operator. The composite class given in the generic form Y1  Y2 can be also defined by the previously defined shape class. For example, by inserting the operator  into a generic form the composite class is defined as Y1  Y2 . Previously defined shape class, given by the rule of placement ½V þ W, can be used to define the composite class by inserting

Fig. 3.5 Examples of objects, members of the n-cyclic class

50

3 Machine Perception MU—Shape Classes

this class into the generic form Y1  Y2 . As the result, the composite class can be defined as ½V þ W  Y2 , Y1  ½V þ W or ½V þ W  ½V þ W. Examples of objects, members of these classes, are shown in Fig. 3.6: ½V  2W  ½V  2W (a), ½½V1   W1   ½V2   W2 (b), ½½V1   W1   ½V2   W2 (c). When the generic form of the composite class is given as X @ Y and the previously defined shape class is given by the rule of placement ½V þ W, as the result of inserting this rule of placement into the generic rules the following composite classes can be obtained: ½V þ W ; @ X2 , X1 @ ½V þ W or ½V þ W @ ½V þ W. Examples of objects, members of these classes, are shown in Fig. 3.6: ½V1   W1 @ X (d), X @ ½V2   W2 (e) and ½V1   W1 @ ½V2   W2 (f). The process of defining composite classes is based on the generic form X1  ::  Xn , or X1  X2 for n = 2. This process can be repeated by applying this generic form and by assuming that Z1 ¼ Y3  Y4 ¼ ðX7  Y8 Þ  ðX9  Y10 Þ and Z2 ¼ Y5  Y6 ¼ ðX11  Y12 Þ  ðX13  Y14 Þ. After inserting it into the generic form Z1  Z2 ¼ ðY3  Y4 Þ  ðY5  Y6 Þ the following composite class is obtained Z1  Z2 ¼ ððX7  Y8 Þ  ðX9  Y10 ÞÞ  ððX11  Y12 Þ  ðX13  Y14 ÞÞ. Examples of objects generated from the specific class, derived from the Z1  Z2 class, given as ^ ðL4R  L5M Þ are shown in Fig. 3.7a and from the specific class ð½L5V   2L3R Þ  ^ 4R  ðL5M  K 1 ÞÞ are shown in Fig. 3.7b. ðð½L5V   2L3R Þ  K 1 ÞðL For n = 3 and the operator , the general placement rule is given in the form X  X  X when all objects are members of the same class ðX1 ¼ X2 ¼ X3 ¼ XÞ or in the form X  Y  Z when objects are from different classes. When X; Y; Z 2 Ln the class can be given as Ln  Ln  Ln , Ln þ 1  Ln  Ln or L2n  Ln  Ln þ 1 .

Fig. 3.6 Examples of object, members of the complex classes

Fig. 3.7 Examples of objects members of the classes ^ ðL4R  L5M ÞÞ ð½L5V   2L3R Þ  5 (a) and ðð½LV   2L3R Þ  ^ ðL4R  ðL5M  K 1 ÞÞ (b) K1Þ 

3.2 Shape Classes—Perceptual Operators

51

Figure 3.8 shows examples of members of the classes: Ln  Ln  Ln (a), Ln  Ln  Ln þ 2 (b), Ln  Ln þ 2  Ln þ 2 (c) and Ln þ 1  Ln  Ln þ 2 (d), where n = 3. The process of defining the shape classes, based on the generic form X1  ::  Xn or X1  X2 for n = 2, can be seen as application of the class forming CF-perceptual transformation, described in Chap. 6. The CF-perceptual transformations, such as N½ðX; xÞ ¼ X  x, N½ðX; xÞ ¼ X  x or N½ðX; xÞ ¼ X  x, are acting on the objects from the convex class ðX; x 2 KÞ in order to produce the object or the sequence of objects, members of the given shape class. For example, the class forming perceptual transformation N½ðX; xÞ ¼ X  x, for X 2 Ln and x 2 Ln , defines objects, members of the general shape classes shown in Fig. 3.9. The analogy CF-perceptual transformation can be used to define the composite shape classes given by utilizing the analogy generic form, denoted as A1 A2 , as described in Chap. 6. The analogy generic form transforms the analogy object A1 (member of the composite class) into the analogy object A2 . Each object A1 and A2 can be a member of any class. By assuming that each object A1 and A2 is a composite object, where the object A2 is the result of transformation A2 ¼ FðA1 Þ, and inserting the object given in the generic form X1  X2 into A2 ¼ FðA1 Þ the following form is obtained A2 ¼ FðX1  X2 Þ, where the transformation FðX1  X2 Þ denotes application of the operator to objects X1 and X2 , so the analogy CF-perceptual transformation can be written as N½ ðX1 ; X2 Þ ¼ fX1  X2 gfX1 X2 g. Figure 3.10 shows examples

Fig. 3.8 Examples of members of the classes Ln  Ln  Ln (a), Ln  Ln þ 2  Ln þ 2 (b), Ln  Ln þ 2  Ln þ 2 (c) and Ln þ 1  Ln  Ln þ 2 (d)

Fig. 3.9 Examples of objects, members of the complex composite class, generated by application of the CF-perceptual transformation

Fig.R3.10 ExamplesR of objects generated N½ ðnÞ ¼ fLnR þ 1 LnE gfLnR þ 1  LnE g

from

the

CF-perceptual

transformation

52

3 Machine Perception MU—Shape Classes

of the objects generated from the analogy CF-perceptual transformation R R N½ ðnÞ ¼ fLnR þ 1 LnE gfLnR þ 1  LnE g, for n = 3. The analogy CF-perceptual transformation is used to define the visual analogy test (VAT) perceptual transformation. The VAT CF-perceptual transformation T1 T2 transforms the analogy test object T1 given by the generic form A1 A2 into the analogy test object T2 given by the generic form A3 A4 according to the formula T2 ¼ GðT1 Þ. By inserting the object given in the generic form A1 A2 into T2 ¼ GðT1 Þ the following form is obtained T2 ¼ GðA1 A2 Þ, where transformation GðA1 A2 Þ denotes application of the operator to objects A1 and A2 , so the VAT perceptual transformation T1 T2 can be written as T½ ðA; B; C; DÞ ¼ fA Bg : fC Dg. Examples of objects generated from the VAT CF-perceptual transformation are shown in Fig. 3.11. As it was shown, the shape classes can be given in the different forms. The most important forms are the normal form and the operator form. Shape classes are applied to solve many perceptual problems. In this Chapter, only the selected shape classes that are used in solving perceptual problems presented in this book, will be briefly described. The shape classes will be presented both in the normal form and in the operator form. The convex class K is the class whose members are convex objects whereas the concave class Q is the class, members of which are concave objects. The convex classes are described in [2] whereas selected concave classes will be briefly described in the following section in the context of introduced notion of the placement rules and the perceptual operator (the operator form). A member of the concave class Qn ½Xðx1 ; . . .; xn Þ is the concave object produced by a geometrical transformation that for the given convex object X 2 K produces the concave object Qn ½Xðx1 ; . . .; xn Þ by subtracting objects x1 ; . . .; xn from the base object X (the base X is the convex part of the concave class). The concave class Qn ½Xðx1 ; . . .; xn Þ can be given in the operator form as X  x1 ; . . .; xn , where  denotes the subtraction operator. The specific classes can be derived from the class X  x1 ; . . .; xn by selecting the rule of placement and specifying the object X and objects x1 ; . . .; xn . From the concave class X  x1 ; . . .; xn , for n = 1, the specific class X  x is derived. By subtracting objects that are members of the convex classes such as Ln , M or K the following concave classes, given in the operator form, can be derived: M  M, Ln  Lm , K  Lm or M  K, and also the very specific classes, shown in Fig. 3.12: L3  L3 (a), L3  L4 (b), L3  L5 (c), L3  L6 (d), L3  M 1 (e), can be obtained.

Fig. 3.11 Examples of objects generated from the VAT CF-perceptual transformation

3.2 Shape Classes—Perceptual Operators

53

Fig. 3.12 Examples of concave objects, members of the concave class X  x

However, for X 2 Ln or X 2 M m ½Ln ; n [ m the subtraction of parts x1 ; . . .; xn 2 K can yield the different concave objects depending on the side from which these parts are subtracted. For example, for X 2 L4R and the given n we can obtain concave objects represented as Qn ½n1 n2 n3 n4 ðn1 Lm1 ; n2 Lm2 ; n3 Lm3 ; n4 Lm4 Þ, where n ¼ n1 þ n2 þ n3 þ n4 and ½n1 n2 n3 n4  denotes that n1 objects are placed on the 1-side, n2 objects are placed on 2-side, n3 objects are placed on 3-side, n4 objects are placed on 4-side. For n = 4 the following objects (see Fig. 3.13a–h), members of the specific class L4  L4 ½n1 n2 n3 n4  given in the operator form, can be obtained: L4  L4 ½1111 (a), L4  L4 ½1210 (b), L4  L4 ½1120 (c), L4  L4 ½2020 (d), L4  L4 ½1030 (e), L4  L4 ½4000(f), L4  L4 ½1020 (g), L4  L4 ½1100 (h). A member of the insertion class U=Qn ½Xðx1 ; . . .; xn Þ is very similar to a member of the concave class Q having not removed any side of the base object X (X is a convex part of the concave class Qn ½Xðx1 ; . . .; xn Þ). The insertion class U=Qn ½Xðx1 ; . . .; xn Þ can be described in the operator form as X ◊x1 ; . . .; xn , where ◊ denotes the insertion operator. The specific classes can be derived from the class U=Qn ½Xðx1 ; . . .; xn Þ by specifying the base object X and objects x1 ; . . .; xn . For n = 1 the specific class, given in the normal form as U=Q½XðxÞ or in the operator form X ◊ x, is derived. By inserting objects that are members of the convex classes such as Ln , M or K into the formula X ◊ x, the following concave classes can be derived: M ◊ M, Ln ◊ Lm , K ◊ Lm or M ◊ K. Examples of objects from the very specific insertion classes are shown in Fig. 3.14: L3 ◊ L3 (a), L3 ◊ L4 (b), L3 ◊ L5 (c), L3 ◊ L6 (d), L3 ◊ M 1 (e).

Fig. 3.13 Examples of objects, members of the specific concave polygon class L4  L4 ½n1 n2 n3 n4 



Fig. 3.14 Examples of objects, members of the insertion class X◊x (a–e) and examples of objects, members of the para-subtraction class X ◊ x (A–D)

54

3 Machine Perception MU—Shape Classes





Fig. 3.15 Examples of objects, members of the cyclic class X  x





















A member of the para-subtraction class A=Qn ½Xðx1 ; . . .; xn Þ is produced by inserting x1 ; . . .; xn into X in such a way that each object x1 ; . . .; xn has only one point in common with the base object X (X – a convex part of the concave class Qn ½Xðx1 ; . . .; xn Þ). The class A=Qn ½Xðx1 ; . . .; xn Þ can be described in the operator form as X ◊ x1 ; . . .; xn , where ◊ denotes the para-subtraction operator. For n = 1 the derived specific class is given as X ◊x, or in the normal form as A=Q½XðxÞ. By inserting objects that are members of the convex class such as Ln , M or K into the formula X1 ◊ X2 , the following concave classes can be derived: M ◊ M, Ln ◊ Lm , K ◊ Lm or M ◊ K. Examples of objects from the very specific para-subtraction classes are shown in Fig. 3.14: L3 ◊ L3 (A), L3 ◊ L4 (B), L3 ◊ L5 (C), L3 ◊ L6 (D). A member of the cyclic class An ½Xðx1 ; . . .; xn Þ is produced by inserting x1 ; . . .; xn into X in such a way that each object x1 ; . . .; xn does not have any point in common with the base object X. The class An ½Xðx1 ; . . .; xn Þ can be given in the operator form as X  x1 ; . . .; xn , where  denotes the cyclic operator. For n = 1 the derived specific class is given as X  x or in the normal form as A½XðxÞ. By inserting objects that are members of the convex classes such as Ln , M or K into the formula X  x, the following cyclic classes can be derived: M  M, Ln  Lm , K  Lm or M  K. Examples of objects from the specific cyclic classes are shown in Fig. 3.15: L3  L3 (a), L3  L4 (b), L3  L5 (c), L3  L6 (d). The cyclic class such as X  x does not uniquely define the placement of inserted elements. For example, the object from the specific class derived from the cyclic triangle class L3  L3 , can have inserted elements placed in many different ways and they can have the very different sizes. The placement can be regarded as the rotation of the object inside the triangle given by the angle of rotation a. For example, the triangles in Fig. 3.16 are given by the following placement rules: L3E f"g  L3E f"g (a), L3E f"g  L3E f!g (b), L3E f"g  L3E f#g (c), L3E f"g  L3E f g (d), where the angle a can be represented by symbols f"; #; ; !g denoting rotation by a ¼ p2. As it was described, a sequence of objects can represent the process, and Fig. 3.16e–h shows the process represented by rotation of the objects X1 and X2 by the angle a: L3E f"g  L3E f#g (e), L3E f#g  L3E f g (f), L3E f"g  L3E f"g (g), L3E f#g  L3E f!g (h). For n > 1 the cyclic class can be given in the operator form as X  x1 ; . . .; xn and the specific classes can be derived from the class X  x1 ; . . .; xn by defining the rules of placement and specifying objects X, x1 ; . . .; xn , and the number n. By inserting objects from the specific classes such as Ln , M or K the following classes, represented by the placement rules, can be obtained: M  M; M; M, Ln  M; Lm ; M, K  M; Lm ; K; Lm ; K or M  K; Lm ; Lm ; M; K. The very specific classes, derived from

3.2 Shape Classes—Perceptual Operators

55

Fig. 3.16 Examples of objects, members of the class ½L3   L3

Fig. 3.17 Examples of objects from the class X  x1 ; . . .; xn (a–d) and examples of objects from the cyclic class X1  X2  x (A–D)

one of the specific classes, are shown in Fig. 3.17: L4  L5 ; L4 ; L4 (a) and L5  L5 ; L3 ; L5 ; L4 (b). In the case when all x1 ; . . .; xn are members of the same class, x1 ¼ ¼ xn ¼ x the n-cyclic class is given as X  nx and the very specific classes, derived from one of the specific classes, are shown in Fig. 3.17: L5  5K 1 (c) and L4  6K 1 (d). The cyclic class A½X1 ðA½X11 . . . A½X1m :ðxÞÞ is the specific cyclic class derived from the class An ½Xðxi ; . . .; xn Þ, where the number m denotes the number of levels of iterations. For the two levels of iterations m = 2 and n = 1, the class can be given as A½X1 ðA½X2 ðxÞÞ or in the operator form as X1  X2  x or ½½X1   X2   x. The specific classes can be derived from the class X1  X2  x by defining the rule of placement and specifying objects X1 , X2 and x. By inserting objects from the specific classes such as Ln , M or K, the following classes can be derived: ½½M  M  M, ½½Ln   Lm   Lm , ½½K  Lm   Lm or ½½M  K  K. The sub-specific classes, derived from one of the specific classes, are shown in Fig. 3.17: ½½L4   5 4 3 6 4 3 L4   L4 (A), ½½L3   L3   L3 (B), ½½L R   L   L (C) and ½½L   L   L (D). A member of the class P½XðX1 X2 Þ is produced by placing the object X2 to the right of the object XR1 , where both objects X1 and X2 do not have any Rcommon parts. R The class P½XðX1 X2 Þ can be given in the operator form as X1 X2 , where denotes the placement operator. By inserting objectsR that are members of the convex classes such as Ln , M or K into the formula X1 X2 the following classes, R R R represented by the placement rules, can be obtained: M M, Ln Lm , K Lm or R M K. Examples of objects from the very specific transparency classes are shown R R in Fig. 3.18: L3 L4 (a), L5 L6 (b). A member of the class @½XðX1 [ X2 Þ is produced by placing the object X2 on the object X1 , where both objects X1 and X2 do not have any common parts. The class @½XðX1 [ X2 Þ can be described by using the placement rule X1 [ X2 . By inserting objects that are members of the convex classes such as Ln , M or K into the formula X1 [ X2 the following classes can be derived: M [ M, Ln [ Lm , K [ Lm or M [ K. Examples of objects from the

56

3 Machine Perception MU—Shape Classes

sub-specific X1 [ X2 classes are shown in Fig. 3.18: L3 [ L3 (A), L4 [ L4 (B), L5 [ L5 (C). ^ X2 Þ is produced by placing the object X1 on A member of the class