Handsketch-Based Diagram Editing [1. Aufl.] 978-3-8154-2053-9;978-3-322-95368-1

342 14 16MB

German Pages XV, 205 [219] Year 1993

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Handsketch-Based Diagram Editing [1. Aufl.]
 978-3-8154-2053-9;978-3-322-95368-1

Table of contents :
Front Matter ....Pages 1-15
Introduction (Rui Zhao)....Pages 17-24
Related Work (Rui Zhao)....Pages 25-37
Low-Level Recognition (Rui Zhao)....Pages 38-75
High-Level Recognition (Rui Zhao)....Pages 76-109
Handi Architecture (Rui Zhao)....Pages 110-134
Implementation (Rui Zhao)....Pages 135-151
Applications (Rui Zhao)....Pages 152-167
Evaluation (Rui Zhao)....Pages 168-182
Conclusion (Rui Zhao)....Pages 183-187
Back Matter ....Pages 188-220

Citation preview

TEUBNER-TEXTE zur Informatik

R. Zhao Handsketch-Based Diagram Editing

Band 5

TEUBNER-TEXTE zur Informatik Herausgegeben von Prof. Dr. Johannes Buchmann, Saarbrücken Prof. Dr. Udo Lipeck, Hannover Prof. Dr. Franz J. Rammig, Paderborn Prof. Dr. Gerd Wechsung, Jena

Als relativ junge Wissenschaft lebt die Informatik ganz wesentlich von aktuellen Beiträgen. Viele Ideen und Konzepte werden in Originalarbeiten, Vorlesungsskripten und Konferenzberichten behandelt und sind damit nur einem eingeschränkten Leserkreis zugänglich. Lehrbücher stehen zwar zur Verfügung, können aber wegen der schnellen Entwicklung der Wissenschaft oft nicht den neuesten Stand wiedergeben. Die Reihe "TEUBNER-TEXTE zur Informatik" soll ein Forum für Einzel- und Sammelbeiträge zu aktuellen Themen aus dem gesamten Bereich der Informatik sein. Gedacht ist dabei insbesondere an herausragende Dissertationen und Habilitationsschriften, spezielle Vorlesungsskripten sowie wissenschaftlich aufbereitete Abschlußberichte bedeutender Forschungsprojekte. Auf eine verständliche Darstellung der theoretischen Fundierung und der Perspektiven für Anwendungen wird besonderer Wert gelegt. Das Programm der Reihe reicht von klassischen Themen aus neuen Blickwinkeln bis hin zur Beschreibung neuartiger, noch nicht etablierter Verfahrensansätze. Dabei werden bewußt eine gewisse Vorläufigkeit und Unvollständigkeit der Stoffauswahl und Darstellung in Kauf genommen, weil so die Lebendigkeit und Originalität von Vorlesungen und Forschungsseminaren beibehalten und weitergehende Studien angeregt und erleichtert werden können. TEUBNER-TEXTE erscheinen in deutscher oder englischer Sprache.

Handsketch-Based Diagram Editing Von Rui Zhao Universität-Gesamthochschule-Paderborn

B. G. Teubner Verlagsgesellschaft Stuttgart . Leipzig 1993

Dr. rar. nal Rui Zhao Rui Zhao is born in 1962 at Shandong, China. Ha studied computer seience and electrical engineering at University of Dortmund from 1982 to 1988. Since 1988 he is a computer selentist at Cadlab, a joint venture University of Paderborn and Siemens Nixdorf Informationssysteme AG. He received his Dr. rer. nat. in 1992 from the University of Paderborn. He Is interested in user interface technology, computer-aided design, graphical editor, and pen-based computers.

Dissertation an der Universität-Gesamthochschule-Paderborn im Fachbereich Mathematik/Informatik Die Deutsche Bibliothek - CIP-Einheitsaufnahme

Zhao, Rul:

Handsketch-based diagram editing / von Rui Zhao. Stuttgart; Leipzig: Teubner, 1993 (Teubner-Texte zur Informatik; Bd. 5) Zug!.: PaderbOrn, Univ., Diss., 1992

ISBN 978-3-322-95369-8 DOI 10.1007/978-3-322-95368-1

ISBN 978-3-322-95368-1 (eBook)

NE:GT Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt besonders für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. © B. G. Teubner Verlagsgesellschaft Leipzig 1993

Umschlaggestaltung : E. Kretschmer, Leipzig

Preface This thesis concerns concepts and techniques of handsketch-based diagram editors. Diagram editing is an attractive application of gestural interfaces and pen-based computers which promise a new input paradigm where users communicate with computers in diagram languages by using gestures.

Though recent advances of

pen-based computer technology and pattern recognition methodology, developing gesture-based diagram editors is difficult. The key problem is the on-line gest ure recognition which can be classified into two levels: one considers to recognize on-line sketched gest ures formed by x-y coordinates into symbols and the other transforms these symbols into editing commands.

In the thesis, I discuss the key ideas of the incremental and the cooperative recognition, the gest ure specification and the structure recognition, as well as decoupling the recognition interface from the command interface. For reducing the development efforts of handsketch-based diagram editors, an editor framework and two experimental applications are designed, implemented, and evaluated. The results indicate that the implementation efforts of such editors is drastically reduced by using the editor framework Handi. The Handi-based editors follows the so-called WYDIWYG input principle, they are easy to use and appropriate for conceptual sketching. This dissertation has been carried out during my research activities in Cadlab, a cooperation University of Paderborn and Siemens Nixdorf Informationssysteme AG. While a dissertation is an individual effort, its magnitude and duration ensure that many others have assisted in its production. There is much for which I have to thank my two advisors, Franz Rammig and Gerd Szwillus. Franz was the first to suggest me that pen-based user interface might be an interesting problem. I thank hirn to inspire me ancl to show me to do

6

PREFACE

this research. I was lucky that Gerd came to University of Paderborn while I was beginning with this research. The elose relationships and common research interest enabled many fruitful discussions. Gerd helped me to concentrate on the structure recognition and structure editing aspects. I thank Jürgen Strauß for many discussions about a better structure of this thesis, Deelan Mulcahy, Frank Buijs, and Peter Hennige for review the dissertation as weIl as Bernd Steinmüller, Hermann-Josef Kaufmann, Thomas Kern, Wolfgang Müller and other "Cadlabers" for their supports to this research. Further, I would like to thank Michael Tauber for introducing me into the domain of visuallanguages with many useful references. Finally, I want to thank my wife Linfang for the support I needed to persevere, for her love and encouragement. I dedicate this work to her and our daughter Anja.

Paderborn, April 1993

Rui Zhao

Contents

1

Introduction

17

......

18

1.2 Results and Contributions

23

1.1

Basic Concepts

25

2 Related Work

2.1

Gestural Interfaces

.....

2.1.1

Notepad Computers

26

2.1.2

Applications .

26

2.2 Pattern Recognition

29

2.2.1

Character Recognition Systems

29

2.2.2

Handsketched Figure Recognition Systems

30

2.2.3

Gesture Recognition Systems

31

.........

32

2.3.1

Specification and Parsing .

32

2.3.2

Visual Programming Systems

34

2.3 Visual Languages

3

25

2.4 Graphical Structure Editors .....

35

Low-Level Recognition

38

3.1

Problem Analysis . .

38

3.1.1

38

Requirements

8

CONTENTS

3.2

3.3

3.4

3.5

3.1.2

Input of Handsketches . . . . . .

39

3.1.3

Specific Properties and Problems

41

Related Problems

43

3.2.1

Overview

43

3.2.2

Character Recognition

45

3.2.3

Irregular Gesture Recognition

47

Fundamental Concepts . . . . . .

48

3.3.1

Hierarchical Classification

48

3.3.2

Object-Oriented Design

50

3.3.3

Incremental Recognition

53

System Design. .

54

3.4.1

Overview

54

3.4.2

Symbol Database

57

3.4.3

Single-Stroke Analyzer

58

3.4.4

Incremental Updater

68

3.4.5

Selective Matcher

73

Summary . . . . . . .

4 High-Level Recognition 4.1

4.2

4.3

Formal Basis

.....

75

76

76

4.1.1

HiNet Diagrams .

76

4.1.2

Handsketch-based Editing

87

Fundamental Concepts . . . . .

92

4.2.1

Compound Specification

92

4.2.2

Object-Oriented System Design

97

Structure Recognition 4.3.1

Hierarchy . . .

99 100

9

CONTENTS Connectivity. . . .

· 102

Command Interpretation .

· 103

4.3.2 4.4

4.5

4.4.1

Constructions

· 104

4.4.2

Destructions .

· 107

Summary ....

· 109

5 Handi Architecture 5.1

5.2

5.3

5.4

5.5

110

Introduction . . .

...........

· 110

5.1.1

Motivation and Design Goals

· 110

5.1.2

Overview ..

· 112

Sketching Subsystem

· 117

5.2.1

Sketching Area

· 118

5.2.2

Pen.

· 119

5.2.3

Ink .

· 119

5.2.4

Stroke

· 120

Recognizing Subsystem .

· 121

5.3.1

Symbol

..

· 121

5.3.2

SymbolTree

· 122

5.3.3

Single-Stroke Recognizer

· 123

5.3.4

Gesture

..

· 124

5.3.5

GestureSet .

· 125

Editing Subsystem

· 126

5.4.1

Hierarchy Component

· 126

5.4.2

Hierarchy View

...

· 128

5.4.3

Diagram Component

· 130

5.4.4

Connector

· 133

....

· 134

Summary

10 6

CONTENTS Implementation

135

6.1

Overview . . .

......

· 135

6.2

Recognizing Subsystem.

· 137

6.3

Sketching and Editing Subsystem

· 143

6.4

Summary

.............

· 151

7 Applications 7.1

7.2

7.3

8

........

· 152

7.1.1

The Statecharts Language

· 152

7.1.2

Gestural Interface .

.154

7.1.3

Implementation

· 155

Petri Nets Editor ...

· 159

7.2.1

The Petri Nets Language .

· 159

7.2.2

Gestural Interface .

· 162

7.2.3

Implementation

· 162

Summary .......

· 167

Statecharts Editor

Evaluation 8.1

9

152

168

Building Handi Applications

· 168

8.1.1

New Applications ..

· 169

8.1.2

Adding and Changing Gestures

· 170

8.1.3

Extensibility. . . . .

· 172

8.2

Partial Utilization of Handi

· 172

8.3

Performance of Handi-based Editors.

· 174

8.3.1

Gesture Recognition

· 175

8.3.2

Human Factors

· 178

Conclusion

183

CONTENTS

11

9.1

Summary of Work

. 183

9.2

Open Problems and Future Work

. 184

Bibliography

188

Statecharts Editing Scenarios

202

Petri N ets Editing Scenarios

207

Index

212

List of Figures 1.1

Basic problems in using diagrams with computers ..

19

1.2

The incremental gest ure recognition system

20

2.1

Iconic sentence .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 35

3.1

A handsketched rectangle

40

3.2

The point coordinates of a handdrawn rectangle stroke

40

3.3

The eight variants to a single stroke rectangle

41

3.4

Start at the edge to draw a rectangle. . . . . .

41

3.5

There are five stroke combinations of a multiple-stroke rectangle ..

42

3.6

Considering stroke-order and drawing directions

43

3.7

Variants of drawing various geometrical figures .

43

3.8

Some examples of irregular gest ures .

47

3.9

The hierarchy of geometrical objects

49

3.10 System components and organization of low-level recognizer

55

3.11 Internal structure of the specific database for geometrical objects .

58

3.12 The control structure of the single-stroke analyzer

60

3.13 The recognition process of a single-stroke square

61

3.14 The eight possible directions of the chain-code

63

3.15 Detected corners are marked by black dots. .

64

3.16 Some typical handdrawn lines . . . . . . . .

65

LIST OF FIGURES

13

3.17 The algorithm of the li ne detector .

66

3.18 Typical handdrawn arcs . . . . . .

67

3.19 The incremental updater merges all connected symbols in the database into a new symbol.

69

3.20 Fuzzy Connectivity

70

3.21 Incremental recognition of a multiple-stroke rectangle

71

3.22 An example of iterative merging .

72

4.1

Examples of general diagrams . . . . . . . . . . . . .

4.2

a) A statechart, b) A Petri net, c) An entity-relationship chart, d) An OOSD diagram. . . . . . . . . . . . . . . . .

78

4.3

Examples of picture elements used in diagrams .

79

4.4

Examples of connections

82

4.5

Examples of containments

82

4.6

Examples of alignments. .

83

4.7

A picture of a structured Petri net

84

4.8

Internal representation of a structured Petri net

87

4.9

Gesture constraints make possible to use the same gest ure shape for different gesture commands. . . .

95

4.10 The high-level recognition system

. 77

98

4.11 Recognizing the hierarchical structure with a structured Petri net

101

4.12 Inserting a new node can change the existing hierarchy structure.

101

4.13 Recognizing hierarchy of orthogonal states in a statechart .

102

4.14 Connectors support to recognize the connection st ruct ure.

103

4.15 Interpretation of constructive commands . . . .

105

4.16 Interpretation of a statechart-specific command

106

4.17 Interpretation of delete commands. . . . . . . .

. 108

LIST OF FIGURES

14 5.1

Relevant layers of Handi-based editors

· 112

5.2

Handi consists of three subsystems. . .

· 113

5.3

Overview of the most important Handi objects .

· 115

5.4

Booch's notations . . . . . . . . . . . . . .

.117

5.5

Class diagram of the sketching subsystem.

· 118

5.6

Asonie digitizer ean be used as an input deviee ..

· 119

5.7

Object diagram of the sketehing subsystem.

· 120

5.8

Class diagram of the reeognizing subsystem

· 122

5.9

Object diagram of the reeognizing subsystem.

· 123

5.10 There are three different results by eheeking gesture eonstraints.

. 124

5.11 Class diagram of the editing subsystem .

. 127

5.12 Object diagram of the editing subsystem

. 129

5.13 The eomposition of plaee eomponent used in Petri nets

. 132

5.14 Protoeols defined for node and edge objeets

..

. 133

6.1

The look and feel of a Handi-based applieation .

· 150

7.1

Graphical symbols used in stateeharts .

· 153

7.2

Graphical symbols used in Petri nets .

· 161

8.1

The freehand drawing functionality is integrated sueeessfully in the well-established drawing editor idraw. This extended idraw allows the user ereate beautified pietures by using handsketches. . . . . . . . 173

List of Tables 4.1

Gesture shapes defined for editing Petri nets . . .

93

4.2

Gesture constraints defined for editing Petri nets .

95

4.3

Short description of gest ure semantics used for editing Petri nets .

97

6.1

Handi prototype libraries code breakdown

136

6.2

Classes of the recognizing subsystem . . .

137

6.3

Classes of the sketching and editing subsystems

146

7.1

Statecharts editor classes . . . . . .

156

7.2

Statecharts editor code breakdown

157

7.3

Petri net editor classes . . . . .

163

7.4

Petri net editor code breakdown

164

8.1

Recognition speed of the low-Ievel recognizer

177

8.2

Recognition speed of the high-level recognizer

178

Chapter 1 Introduction The "computer-aided designer" of today can utilize CAD tools at every stage of the design process, from behavioral and functional specification to process simulation and optimization. Software engineers can make use of various CASE tools for program development. But the conceptual design is still usually done with pen and paper even when the designer has access to a powerful computer and is knowledgeable ab out working with it. Many resources of computers remain unused in this design stage. The extra transfer from paper into computer makes many primary ideas undocumented. It costs time and produces unnecessary errors. One of the reasons that computers are not used at the first creative and conceptual design stage is that most of the current graphical interfaces are unfortunately "computer-centered" rather than "user-centered". The menu and command selection interface and the input devices mouse plus keyboard are not appropriate for conceptual sketching. The strengths of conventional user interfaces lie in the later stages of the design process, i.e. enter the finished design into computer, they provide only limited support for the early design stages. For drawing a rough sketch, or taking a short note, the interface is simply not as fast nor as convenient as pen and paper. Conceptual design is usually supported by various diagram languages.

Dia-

gram languages are visual programming languages which use pictures formed from graphical elements as programs. Visual means using graphics instead of using text, because graphics can be easier comprehended and communicated by humans than text. One picture says more than thousand words. Graphics helps idea organization

18

CHAPTER 1. INTRODUCTION

in conceptual design and communica.tions in team project work. Examples of such diagram languages include many types of traditional diagrams used within computer science, such as Petri nets, statecharts, as weIl as graphical methodologies developed for software engineering, object-oriented analysis and design. Tools for drawing such diagrams are specific diagram editors, not general drawing editors. Recently, notepad computers have become commercially available. The essential component that makes such computers attractive is the so-called "paper-like" interface which will emerge as a real alternative to the keyboard and mouse based one. An important advantage of pen-based computers is the mobility, it can be used everywhere. Such interfaces have several significant advantages which make gest urebased systems appealing to both novice and experienced users. A single gesture can specify a command with all required parameters simultaneously. A simple gest ure can combine several commands in a natural manner. Diagram editing is an attractive application of gestural interfaces and pen-based computers which allow the user to communicate with computers in diagram languages by using handsketches so that the user can draw diagrams in the same way as with paper and pen. While gesture-based diagram editors offer significant benefits, building such editors is difficult. Apart from the hardware improvement of Hat displays with digitizers, the key problem is the gest ure recognition which allows the user to sketch diagrams with relatively few restrictions. From the user's view, a gesture-based diagram editor should be modeless and intelligent to give the user a feeling that the editor understands the diagram language. This thesis attacks the key software problems of the on-line gest ure recognition and the integrated handsketch-based editor architecture. A novel incremental gesture recognition concept is presented and integrated in the object-oriented software architecture called Handi. Handi provides powerful programming abstractions for building handsketch-based diagram editors with less efforts.

1.1

Basic Concepts

As shown in figure 1.1, a diagram is represented in three layers: the internal representation in form of structured objects, the intermediate diagram in form of graphical descriptions based on graphical primitives like rectangles or lines, and the pixel- based

1.1. BASIC CONCEPTS

19

picture on the screen inside a window for the human's eyes. Using diagrams directly with computers refers to the following human-computer communication problems: One is the automatie drawing of diagrams from the internal representation j another is the inverse problem, i.e. constructing the internal representation from a picture.

Figure 1.1 : Basic problems in using diagrams with computers. The problem to get a picture from a given internal representation refers to graph layout and computer graphics. The generation of a diagram description from an in-

ternal representation is the problem of graph layout. Computer graphics deals with the generation of display pictures from nonpictorial information. This work concentrates on the second problem, that is, from a picture to the internal representation of a diagram, which is clearly a pattern recognition problem. We classify this pattern recognition problem, again, into scanning and parsing. Mapping the external picture representation into primitive syntactical elements is the realm of scanning. Within conventional graphical user interfaces, command modes and direct manipulation techniques such as rubberbanding forces the scanning to a computer-centered style. Within gestural interfaces, scanning is an on-line recognition problem. To get an internal representation from basic syntactical elements is the realm of visual parsing. Similar to a textual program parser, a visual language parser depends on the underlying language syntax. Parsing a visual program is more difficult than parsing a textual program. This is because a visual program uses two-dimensional information to express its syntax and semanties. Recently, research in this domain concentrates either on handsketch recognition or on parsing visual languages. There has been a lack of integrated concepts and software architectures for building handsketch-based diagram editors.

20

CHAPTER 1. INTRODUCTION

Incremental Recognition Within a handsketch-based diagram editor, the gest ure recognition is a so-called on-line recognition problem. On-line recognition means that the machine recognizes pictures while the user is drawing. The input data of a gesture-based diagram editor is a sequence of point coordinates captured by the input device. We call a

recognizer which transforms such point coordinates into graphical symbols, a lowlevel recognizer which refers to the scanner. The low-level recognizer determines the dass and the attributes of each graphical symbol drawn by the user. Further, a gesture-based diagram editor needs a high-level recognizer to transform these basic symbols into editing commands which are interpreted by a diagram editor in turn to create the internal diagram structure. The most important issue is that the two recognizers must work together in a diagram editor. Figure 1.2 shows the design of our gest ure recognizing system.

x-y

..

graphical symbols

coordinates new stroke

----

Low-Level Recognizer

look-up

High-Level Recognizer

editing commands

;"I~~~~~~~~~~~~~

~~~~~~~~~~~-

:

I

i'- ------low-Ievel terminated , -- --- - --- ----- - - - - - --- ---- --- - - - - --'

..

data flow

control flow

~~~~~~~~~~-

Figure 1.2: The incremental gest ure recognition system

The essential idea of our incremental gesture recognition is to allow the high-level recognizer incrementally transforming the graphical symbols recognized by the lowlevel recognizer into editing commands for creating and manipulating the underlying diagram. This integrated recognition concept differs from existing approaches in the following aspects:

1. Existing visuallanguage parsers usually consider a complete picture as input, oUf high-level recognizer treats each graphical symbol the user has just drawn incrementally.

1.1. BASIC CONCEPTS

21

2. In contrast to other incremental visuallanguage parsers which directly create the internal diagram representation, the output of our high-level recognizer are editing commands which are compatible with conventional diagram editors. This has the advantage that the gest ure recognizer can be integrated into diagram editors which allow the user not only to draw new diagrams, but also modify existing diagrams with gest ures. Low-Level Recognition For solving the low-level recognition problem, a new method for on-line recognition of handsketched geometrical figures has been developed. In contrast to other on-line pattern recognition systems, strokes are not represented as feature vectors and matched to prototypes in a dictionary. Instead, a stroke is immediately dassified top-down along a symbol hierarchy after it has been drawn. The recognized symbol will be displayed as a regular graphical object at once. A major advantage of this method is that the user can get an immediate response of recognition results. One significant feature of this novel method is that multiple-stroke handsketches can be recognized incrementally. This was a problem for most existing gesture-based systems. Object-oriented design has been used to build this low-level recognizer. A dass-hierarchy of geometrical figures which makes use of inheritance is defined for encapsulating strokes and all recognizable geometrical objects. The polymorphism concept of object-oriented programming enables automatie and hierarchical control of the recognition process. High-Level Recognition In contrast to the low-level recognition, the high-level recognition depends on the underlying diagram language. For this reason, we introduce the dass of HiNet diagrams which mainly represent hierarchy and connectivity, and build a formal model of handsketch-based diagram editors. The key issue of the high-level recognition is how to specify the underlying diagram syntax. Our idea is to provide a mechanism to unify the specification of the language and its manipulations. We consider a visuallanguage as an initial object and a collection of gest ure editing operations. Any object that can be obtained by applying a sequence of allowed editing operations is then defined to be in the language. We specify the underlying diagram by defining a set of gest ures , each one corresponds to an editing command. With our specification mechanism, each gesture defines a gesture shape, a set of gest ure constraints, and the gesture semanties. The main task of the high-level recognizer is to check gest ure constraints by examining the defined

22

CHAPTER 1. INTRODUCTION

spatial relationship. One goal in designing the high-level recognizer is to permit an easy integration of a gesture recognizer into an object-oriented editor architecture. To achieve this goal, gest ure semantics are defined as the generation of "normal" editing commands which can be interpreted by the underlying diagram editor in the same way as other commands. Communication Existing approaches merely identified aglobai data flow from a scanner to a parser, and therefore treated gesture recognition and diagram parsing as two separate and independent problems. The key issue of our concept is to consider them as two cooperative and tightly integrated components of the recognizing system of a gesture-based diagram editor. The cooperative communication between these recognition subcomponents supports our incremental gest ure recognition, in which the user can sketch the desired diagram stroke by stroke, and the user immediately sees what happens after each stroke has been drawn. In order to achieve this incremental gestural dialog without any explicit command from the user such as "draw something, and dick a button for parsing" , our recognition system benefits by the inherent control signals which are illustrated by dotted arrow-lines in figure 1.2. Each pen-down event at the beginning of a new stroke produces automatically a "new stroke" signal which can be used to trigger the low-level recognition. The termination event of the low-Ievel recognizer can be used to activate the high-level recognizer to parse graphical symbols into editing commands.

Handi Architecture The basic design idea of Handi is to encapsulate common characteristics of handsketch-based diagram editors into dasses by using object-oriented methodology. The concept of Handi is gained from experiences with several prototype editors. Handi integrates techniques of on-line handsketch recognizing, diagram parsing, and graphical structure editing into cooperative components of handsketch-based diagram editors. Handi consists of three subsystems: a sketching subsystem, a recognizing subsystem, and an editing subsystem. An editor for a particular diagram language relies on the sketching subsystem for handling free drawing input, on the recognizing subsystem for gest ure recognition, and on the editing subsystem for its structure representing and editing capabilities. One of the key issue of Handi is to build Handi on top of a general editor framework by reusing the general graphical editing functional-

1.2. RESULTS AND CONTRIBUTIONS

23

ity. Handi does not offer functionality which is supported by such editor framework or any toolkits to avoid replicating existing functionality in Handi; instead we have focused on providing new and previously unsupported capability, that is, the free handsketching, the gest ure recognition, and the creation and manipulation of diagrams.

1.2

Results and Contributions

This thesis solves the aforementioned software problems by an integrated concepts for gest ure recognition and an object-oriented software architecture for building handsketch-based diagram editors. The primary contributions of this dissertation which have partly been presented in [145, 146, 144, 148, 147] are: • An integrated concept which combines the low-level recognition and the highlevel recognition in an incremental recognition system. • An object-oriented and hierarchical algorithm for on-line and incrementally recognizing handsketched graphical symbols. The main characteristics which distinguish our low-level recognizer from all existing recognition systems are an incremental control structure and a novel object-oriented architecture for efficient classification of geometrical figures. • A formal model of HiNet diagram languages and handsketch-based diagram editors. • A mechanism for gesture specification which integrates the diagram syntax definition and the editing operations. • The Handi architecture with powerful programming abstractions for developing handsketch-based diagram editors. • Two experimental diagram editors are built and evaluated, to demonstrate the vi ability of the basic concepts and the Handi architecture.

24

CHAPTER 1. INTRODUCTION

Thesis Organization Chapter 2 discusses related work. Chapter 3 and 4 present the low-Ievel recognizer and the high-level recognizer, respectively. Chapter 5 depicts the Handi architecture by using the Booch's graphical notation of dass and object diagrams. Chapter 6 describes main aspects of a prototype implementation of Handi. Chapter 7 presents two experimental diagram editors. Finally, chapter 8 evaluates this work and chapter 9 summarizes the thesis and discuss directions of future work.

Chapter 2 Related Work In this chapter, we briefly state the relationship of this thesis to similar work in the related field classified into four categories: gestural interfaces, visual language systems, pattern recognition systems, and graphical structure editors. The general relationships are as foIlows: A handsketch-based diagram editor is a specific application of gestural interfaces. Diagrams are visual languages, and diagram processing relates closely to concepts of visual programming systems. The key problem of handsketch-based diagram editing is the on-line gesture recognition which is a pattern recognition problem. A diagram editor is a graphical structure editor with specific input technique.

2.1

Gestural Interfaces

Some general work attempts to define gest ures as a technique for interacting with computers. Morrel-Samuels [81] examines the distinction between gestural and lexical commands, and then furt her discusses problems and advantages of gestural commands. Wolf and Rhyne [138] present a taxonomy of direct manipulation which considers gestural interface as a member of direct manipulation interfaces. Baecker and Buxton [8] discuss human factors concerning gestural interfaces, as weIl as hardware and software issues. Buxton studies the lexical and pragmatic considerations of input structure [14], specially for performing selection and position tasks [13], and discusses the use of muscular tension and motion to phrase human-computer dialogues [15, 16].

CHAPTER 2. RELATED WORK

26

2.1.1

Notepad Computers

The idea of using gest ures and handwriting to interface with computers has attracted people for many years. The graphics tablet and stylus have been in use since at least 1964 [25]. Interactive graphics displays have been in use since at least 1963 [116]. Despite the existence of these tools, communication with stylus and display bears little resemblance to the way we communicate with pencil and paper, chalk and blackboard. We all write letters, understand basic proofreading symbols, and software engineers discuss their designs with various block diagrams. But with very few exceptions [130, 57], today's user interfaces make little or no use of these skills. Recent advances in devices and VLSI technology have been used to realize a notepad computer in the size and the weight of a book. A few of commercial penbased computers which utilize character and gesture recognition techniques become to be available, for example, NCR's NotePad, Momenta's Pentop, IBM's Thinkpad, GRiD's Pad Pad [127]. Many pen-oriented operating systems and window systems are getting available, for example, GRiD's PenRight!, GO's PenPoint[17, 90], CIC's PenDOS, and Windows for pen computing [21]. Altabet [3] discussed an integration of pen-based computer and multimedia technology to provide a truly natural human interface.

2.1.2

Applications

In the early seventies, Alan Kay described the idea with a so-called Dynabook [58] which may be considered as the first approach for building pen-based computers. Unfortunately, only a little work was dedicated to designing and developing pen-based computers and gesture-based systems due to the relative low advanced digitizing technology and difficulty of handwriting and handdrawing recognition. In [113], early problems and issues were discussed which limited the acceptance of user interfaces using gest ure input and handwriting character recognition. Nevertheless, some early work has been done by using handwriting interfaces or gestural interfaces in different application areas. Hosaka and Kimura [50] used handwriting input in an interactive geometrical processing system for designing and manufacturing of three dimensional objects. Agraphie tablet digitizes the user's handwriting whieh ean be reeognized. Many

2.1. GESTURAL INTERFACES

27

function keys are used in this early work to allow the input process with a tablet. For example, the user must press a function key to begin a drawing process. Odawara et al [92] presented a design environment for silicon compilation by using a LCD digitizer in the same style as today's pen-based computers. A diagrammatic hardware description language (ADL) forms the input of this silicon compiler. The designer can draw ADL diagrams like drawing on paper, and therefore is able to concentrate his attention upon the design for a long time. The system can recognize handwritten characters and ADL symbols. Gestures are also used in architecture design. Makkuni [75] developed a system which allows a user to design Chinese temples with gestures. Makkuni described a gesture language which supports the design process such as gesturally exploring a pavilion roof. Jackson and Roske-Hofstrand [53] used circles as select gestures for mouse-based selection without button presses. Circling motions are detected automaticallYj their experiments show that many users prefer circling over button clicking for selecting objects. At IBM, many research has been done in the paper-like interfaces project [102]. The goal of this project was to develop a body of knowledge about the applicability of gestural human computer interfaces, and to explore software technology for the development of gestural interfaces. Within this project, Rhyne et al discussed the dialogue management for gestural interfaces [101] and described a prototype electronic notepad [100]. Wolf et al presented several prototype applications of the paper-like interfaces such as information processing with spreadsheets [102], educational applications [20], medication charting [4], freehand sketching, gestural creation of music score and interpretation of handdrawn mathematical formulae [139], and support of group work [137]. They presented several analysis on how weH such interfaces work [135, 136]. At MCC, the Interactive Worksurface Projects have been completed for building CAD systems using handwriting recognition [49]. Successful research has been done in interactive tablet hardware [6], handwriting recognition with neural networks [77, 76, 96], and visuallanguages [134]. Our goal is similar to that of MCC, however, we concentrate on the handsketch-based diagram editing within a graphical structure editor. In contrast, MCC mainly investigated hardware design and fundamental

28

CHAPTER 2. RELATED WORK

research in handwriting recognition with neural networks. Kurtenback and Buxton [68,67] designed a prototype graphical editor (GEDIT) that permits a user to create and manipulate three simple types of objects using shorthand and proofreader's type gestures. Using handdrawn symbols, the user adds, deletes, moves and copies these objects. The most essential difference between GEDIT and Handi-based editors is that Handi-based editors are graphical structure editors for diagram languages, and GEDIT is a very primitive general drawing editor just for pictures consisting of squares, circles, and triangles. Furthermore, gest ures are also used in combination with natural language processing for multimodal reference. Schmauks and Reithinger [110, 2] discussed the application of pointing gestures in natural language dialog systems. However, Schmauks used gest ures mainly for pointing, which differs from our sketchingoriented gestures. Gestures were classified into punctual pointing gest ures and nonpunctual pointing gestures.

GRANDMA The gesture-based system GRANDMA [106] developed by Dean Rubine comes dosest to our research. Rubine describes two methods of integrating gestures and direct manipulations. First, GRANDMA allows views that respond to gest ures and views that respond to dick and drags to coexist in the same interface. Second, GRANDMA supports a new two-phase interaction technique, in which a gest ure collection phase is immediately followed by a manipulation phase, which is called eager recognition

[108]. Similar to GRANDMA and different to several other approaches, Handi-based editors support the coexistence of the two interface techniques, that is, within Handi-based editors, the user can still dick, drag, and rubberband all graphical objects. Differing from the eager recognition approach, Handi supports an incremental sketching style which is more appropriate for editing diagrams. However, the immediate feedback of our low-Ievel recognition has the same goal as the eager recognition, that is, to avoid that an entire gest ure must be entered before the system responds. In GRANDMA, only single-stroke gest ures can be used, in a Handi-based editor, there is no restrictions in the stroke nu mb er of gestures. Further characteristics which differ Handi from GRANDMA are: 1) GRANDMA is

2.2. PATTERN RECOGNITION

29

built from scratch by directly using the X window system, Handi is built on the top of a general editor framework. 2) Handi comprehensively supports the development of handsketch-based diagram editors. In contrast, GRANDMA concentrates on the input model with sophisticated event handlers. 3) Handi supports multiple-stroke gestures, GRANDMA supports multiple-path gest ures of multiple-finger input [105). 4) GRANDMA supports irregular gestures which are represented as a vector of realvalued features, and its recognizer must be trained by the user with many examples. Handi supports geometrie gest ures , and provides an extensible set of frequently used graphical symbols which can be used without training. 5) In GRANDMA, gesture semantics are specified in form of gest ure interpreters which manipulate the internal objects directly. In Handi, gesture semantics are specified in form of editing commands.

2.2

Pattern Recognition

Apart from the hardware problems, the main barrier for wide usage of gestural interfaces and pen-based computers is the on-line recognition of handsketched gest ures and handwritten characters. The state of the art in on-line gesture and handwriting recognition may not be well-known outside its particular field. It has been a topic covered more often in pattern recognition community than in user interface design and computer graphics. Tappert et al [122) provides a comprehensive survey of the different approaches taken to on-line recognition.

2.2.1

Character Recognition Systems

The development of electronic tablets in the 1960s led several researchers to attempt the on-line recognition of handwritten characters. Some of these early attempts were rather successful [88), but the interest gradually diminished. Recently, there has been a resurgence of on-line pattern recognition due to the appearance of pen-based computers, high performance graphical workstations, and nationallanguage considerations (Chinese and Japanese character input). There are many systems designed to recognize different types of characters, for example, digits [24, 27), English [96), Arabic [28), Chinese [140), Japanese [62)letters, and mathematical symbols [10, 26).

30

CHAPTER 2. RELATED WORK

Decision Tree

Kerrick and Bovik [59] designed a microprocessor-based charac-

ter recognizer which is closely related to the hierarchical control structure of our low-Ievel recognizer. A binary decision tree with simple features is used to rapidly reduce the set of candidate characters to a very small set. However, our hierarchical classification of graphical symbols is not the same as adecision tree for the following reasons: All nodes in a hierarchy of geometry represent reasonable recognition results. On the contrary, in adecision tree, only leafs represent recognition results. All nodes in adecision tree represent ambiguous states. Adecision tree is a binary tree, a symbol hierarchy is not restricted to be a binary tree. Decision trees are used only for dassification of characters, our hierarchical recognition combines dassification and feature analysis with automatie control in terms of object recognizes itsel[ Further, our low-Ievel recognizer calculates features only when they are re-

quired, while within adecision tree dassification all features are calculated before any dassification is initiated.

2.2.2

Handsketched Figure Recognition Systems

There are only a few approaches investigated to on-line recognition of handsketched figures. Murase [82] describes a system for recognizing handsketched flowcharts. However, his method is designed for recognizing complete flowcharts, that is, the user has to draw a complete flowchart before the recognition can begin. All subfigures that can be symbols are extracted from the input sketch. Elastic matching distances are calculated between these candidate symbols and prototype symbols. Finally, the system simultaneously recognizes and segments the entire figure by choosing the candidate sequence that minimizes the total sum of distances along the sequence. Although this system considers a similar picture dass, the design philosophy is total different. This recognition system is not designed for a gestural interface, rather the user has to draw a complete diagram and then start the recognition system. Similar to Murase's approach, Kojima and Toida [63] developed a system for online handdrawn figure recognition. An adjacent strokes structure analysis method (ASSAM) is described. The figures are classified into fundamental figures and symbols. A fundamental figure means a li ne segment or a closed figure composed of only one loop. A symbol means a figure composed of several fundamental figures. The recognition algorithm is composed of two steps: fundamental figure recognition and symbol recognition. The fundamental figure recognition is done by analysis

2.2. PATTERN RECOGNITION

31

of the number of apexes and the categories of line segment between them. While the recognition algorithm of fundamental figures appears to be quite ad hoc, the approach of combining of adjacent strokes makes the algorithm independent of the stroke-order and stroke-number. The combination of adjacent strokes is similar to our approach of the incremental updater. The essential difference is that our low-level recognizer combines connected strokes into a new stroke which is displayed immediatelyon the screen. However, the combination used in ASSAM is merelya technique of the matching algorithm.

2.2.3

Gesture Recognition Systems

Gestures have properties that are different from those of handwritten characters. For example, gest ures do not have regular heights and orientations. Therefore, new recognition methods for gest ures are necessary. Dean Rubine [107] presented a trainable statistical gesture recognizer for singlestroke gestures. The recognition is done in two steps. First, a vector of features is extracted from the input gest ure. The feature vector is then dassified as one of the possible gest ures via a linear machine. The intelligence in Rubine's algorithm is in its many characteristics, 13 in all, which characterize a stroke. In his algorithm, the usage of weights for specified characteristics is similar to neural net weight adjustment. The weights are determined by training from example gestures. Kim [60] presented a gesture recognizer based on feature analysis which has been improved and redesigned by Lipscomb [73] who combines techniques of angle filtering and multi-scale recognition. An angle filter is used to reduce noise and quickly distill the many input points of a stroke. The angle filter pro duces output points where stroke curvature is high, simply said at the corners. Later, a recognizer uses a feature finder to decide which candidate features are significant. These features match a stored prototype stroke, triggering recognition. Recognition succeeds only when the known and the unknown stroke have the same number of points. This is achieved by training. In contrast to Rubine's algorithm, the multi-scale recognizer concentrates its intelligence in its multi-scale data structure, not in its stroke characteristics or weighting. A common feature of these gesture recognizers is the usage of training which is good for gestures used only for a single application, but not appropriate for a dass of

CHAPTER 2. RELATED WORK

32

diagram languages. In section 3.2.3, we discuss some further differences between an irregular gest ure recognizer and a symbol recognizer after our problem is analyzed in detail.

2.3

Visual Languages

The term visual language is used to describe several types of languages: languages manipulating visual information, languages for supporting visual interactions, and languages for programming with visual expressions [112]. Myers [84] emphasizes the difference between visual programming and program visualization systems. Visual programming refers to any system that allows the user to specify a program in a two-(or more)-dimensional fashion. Program visualization is an entirely different concept from visual programming. In visual programming, the graphics are used to create the program itself, but in program visualization, the program is specified in a conventional, textual manner, and the graphics is used to illustrate some aspect of the program or its run-time execution. In this work, we consider only visual programming languages which may be further classified, according to the type and extent of the used visual expressions [112], into icon-based languages, form-based languages, and diagram languages which are the target languages of this thesis.

2.3.1

Specification and Parsing

Visual programming environments have been a research topic for many years, there are different approaches for the definition and parsing of visuallanguages. Spatial parsing is the process of recovering the underlying syntactic structure of a visual program from its spatial arrangement. Most of the existing approaches of spatial parsing are grammar-based and batch-oriented. Examples of visuallanguage grammars are picture layout grammar [39], positional grammar [22], relation grammar [23, 31], graphical functional grammar [64], unification-based grammar [134], and constrained set grammar [47]. A grammar-based visuallanguage parser is designed to be generated from a grammar definition. Its user interface is similar to that of a conventional program compiler. The input is a picture in a certain format such as a picture description [40], PostScript [54], or bitmap [134]. The output of the

2.3. VISUAL LANGUAGES

33

parser consists of a statement about the syntactical correctness of this picture and an attributed parse structure. However, the visual programming environment which uses such a parser has the same style as the currently used textual programming environment: the programmer has to draw a complete diagram and input it to the parser. One serious problem is the visualization of error messages of spatial parsing, because errors of a visual program cannot be reported by using line numbers like conventional compilers do. The batch-oriented approach of spatial parsing is obviously not appropriate for interactive editing. Similar to thc spatial parsers, our high-level recognizer recognizes handsketches from the spatial arrangement as weH. However, the high-level recognizer is designed as an integrated component of handsketch-based diagram editors, so that the parse structure is at the same time the internal object graph of the structure editor. The most important issue of our high-level recognition differing from existing spatial parser approaches is the need to process input incrementally while imposing no constraints on the input order of the graphical symbols produced by the low-Ievel recognizer.

A Constrained Set Grammar [47) is a coHection of productions which consist of a set of non-terminals on the left-hand side, a set of symbols on the right-hand side, and a coHection of constraints between the symbols on both sides. Constraints are the key feature of the constrained set grammar which enable information about spatiallayout and relationships to be encoded in the grammar. The spatial relationships used in the constrained set grammar are similar to those used in the gesture constraints of our high-level recognizer. However, the constraints are checked with totaHy different techniques. A constrained set grammar is transformed into a set of clauses written in the constraint logic programming language CLP [46], therefore, logic programming tools for specifying constraints as weH as general-purpose theorem provers can be used. However, the computational cost incurred by such methods is so high that such parsers are very slow.

Unification-based Grammar designed by Wittenburg et al [134) supports parsing handsketched inputs from a graphical tablet. Their goal corresponds closely to our goal, that is, recognizing handsketched diagram pictures. A unification-based grammar and a parsing algorithm are presented for defining and processing vi-

34

CHAPTER 2. RELATED WORK

sual languages. The lexical lookup process is represented as a set of productions that maps exclusively from terminals to nonterminals. Two testbed applications for the parser are implemented, a math sketchpad and a flowchart sketchpad, which recognize mathematical expressions and structured flowcharts, respectively. These systems were targeted for the MCC's Interactive Worksurface discussed in section 2.1. A feature of these systems similar to Handi-based editors is the possibility to accept elements in the order they are drawn, rather than in some spatially defined ordering. However, Handi-based applications are structure editors both for incrementally creating diagrams and manipulating existing diagrams. Another significant feature of the Wittenburg's system is that it collects strokes until the user exceeds a time-out threshold between strokes. Further , the set of strokes are presumed to represent a single symbol of the input vocabulary. In contrast, our incremental recognizer immediately recognizes each input stroke, which does not force the user to make explicit pauses between strokes of different symbols.

2.3.2

Visual Programming Systems

SIL-ICON is a visual compiler [18] developed at the University of Pittsburgh which supports the specification, interpretation, prototyping, and generation of iconoriented systems. An icon interpreter uses a formal specification of an icon system to understand and evaluate a visual sentence. The design of the system is based on the concept of a generalized icon. A generalized icon has a dual representation (Xm , Xi) where X m is the logical part (meaning) and X; is the physical part (image). An essential characteristic of the generalized icon concept is that the logical part and the physical part are mutually dependent, which is similar to Handi's concept of dual representation of diagrams in components and views. In Handi-based systems, gest ures always relate to the diagram's view, in the same way as icon operators with the physical part of the icons. The physical part of an icon is specified by a picture grammar. A picture grammar is a context free grammar where the terminal symbols include both primitive picture elements and spatial image operators. The operators describe compositions of the physical parts of icons. SIL provides three operators: horizontal concatenation (noted with the character '+'), vertical concatenation ('1\'), and spatial overlay ('&'). Using these operators, astring can describe a complex physical icon. For example, the iconic senten ce in figure 2.1 can be represented by the string

2.4. GRAPHICAL STRUCTURE EDITORS (box

35

+ box ) & cross

Figure 2.1: Iconic sentence

GREEN is a GRaphical Editing ENvironment designed by Golin [40] which allows the programmer to create and manipulate a visual program. The visual structures can be recovered by a visual parser. The language syntax is defined by the picture layout grammar [39]. An editor provides the user interface to create a set of primitive picture elements in a picture file which can be processed by the parser. The parser tries to find a valid parser structure among all possibles by using multiple set structure. The set of graphical primitives is restricted to boxes, octagons, circles, text, lines and arrows with fixed attributes. The performance of this system is too slow to be practical for use as a program development system [40]. The editor is not designed to allow the user to make handsketches, instead a grid alignment mechanism is provided to guarantee that attached objects have appropriate coordinates.

2.4

Graphical Structure Editors

Graphical structure editors is a subdomain of visual programming environment which is a software tool or collection of tools to support programming in a visuallanguage. The role of the visual programming environment is analogous to the role filled by a traditional, text-based programming environment such as Cornell Program Synthesizer [124]. Visual Programming environments must support two basic tasks: the creation and manipulation of visual programs; and the processing of visual programs by analying and executing them. Diagram editing systems refer to the first one which again consists of several different aspects. Systems for graph drawing such as Edge

36

CHAPTER 2. RELATED WORK

[87] or diagram visualization such as Compound Digraph [115] consider aspects of aesthetics layout of graph or diagrams which are not considered in this work. This work relates to graphical structure editors because handsketch-based diagram editors are a kind of graphical structure editors. A number of graphical structure editors have been designed with several different aspects which can be classified into two basic principles. One is the generator approach such as GEGS [117], PAGGED [38], and LOGGIE [7]. Another is the toolkit approach such as Unidraw [129], ET++ [131], and Diagram Server [9]. The essential difference between the generator approach and the toolkit approach is the manner of how to reduce the development efforts of structure editors. Within the generator approach, the editor designer specifies an editor with a grammar, and an editor can be generated. Within the toolkit approach, basic building blocks with desired functionality are provided as reusable objects and classes so that an editor can easily be developed. This work follows the toolkit approach because there are so many common features among HiNet diagram editors, which can be encapsulated into reusable classes. The common characteristics of all graphical structure editors are that graphs are used as internal representations. These graphs are characterized by user-defined node types and edge types. Some systems allow constraints such as "every node of type x must be connected to at least one node of type y." Considering the user interfaces of conventional graphical structure editors, all systems include sophisticated commands with many complex modes for entering and deleting picture elements. These commands can usually be selected from a language-dependent palette of command buttons or from menus. Language-independent commands like file operations, zooming, and scrolling are standard-commands. Similar to graphical structure editors, the internal representation of the underlying diagram structure within our high-level recognition is an object graph as well. However, we use the object-oriented methodology to represent different node types and edge types in appropriate object classes. Basic concepts for representation and recognition of hierarchy and connectivity are developed. In contrast to conventional structure editors, handsketch-based editors use gest ures to invoke commands for entering and deleting graphical elements. The input is in free form, modeless in any order and place. The user can express ideas directly by handdrawing. Therefore, command palettes are usually not necessary. For example, in a handsketch-based

2.4. GRAPHICAL STRUCTURE EDITORS

37

Petri net editor, creating a place object can be done by drawing a cirde; and creating a transition object can be done by drawing a rectangle.

GEGS The Graphical Editor Generator System designed by Szwillus [117, 118, 119] is an editor generator system with a powerful specification method based on attributed gramm ars which can specify four categories of information: 1) a dass of directed graphs with node and edge types as the set of valid internal editing objects, 2) graphical presentations associated to graphical node types, 3) rules and dependencies between nodes and edges of the internal structure graph, 4) different editing mo des to allow switching constraints checking on and off. The basic idea of GEGS is to generalize the concepts of textual structure editing to graphicallanguages. A significant feature of GEGS is that the combination of visuallanguage specification and the user interface specification in one language. This is not at all dissimilar to our mechanism of the gest ure specification which combines gest ure shapes, gest ures constraints, and gesture semantics. However, our gest ure specification is not considered as components of a generator input language, and GEGS does not consider gestural interfaces at all.

Unidraw

designed by Vlissides [129, 128] comes dosest to the basic principles used

in our Handi architecture. Apart of the current Handi implementation is built on top of Unidraw. Unidraw simplifies the construction of graphical editors by providing programming abstractions that are common across domains. Unidraw defines four basic abstractions: components encapsulate the appearance and behavior of objects, tools support direct manipulation of components, commands define operations on components, and external representations define the mapping between components and the file format generated by the editor. Unidraw emphasize the generality of such an editor framework, and supports a broad range of domains such as technical and artistic drawing, music composition, circuit design, and many others. Due to the endeavor of the generality, a diagram editor designer still must implement many common features among diagram editors. Further, Unidraw does not consider the input technique of free hand drawing.

Chapter 3 Low-Level Recognition The input medium of a handsketch-based diagram editor is a sequence of point coordinates captured by the used input device. A low-Ievel recognizer transforms such point coordinates into graphical symbols. We begin this chapter with the problem analysis, consider several related pattern recognition problems, and state why these existing methods are not appropriate for solving our problem. Then we present our fundamental concepts and design decisions which lead to an objectoriented system design. We describe the system components and algorithms in detail and conclude the chapter with a summary of our method.

3.1

Problem Analysis

3.1.1

Requirements

Handsketch-based diagram editors are specific gesture-based systems which place several specific requirements on the low-Ievel recognition. The requirements we identified are as follows: 1. Recognition must be fast. Response time is widely acknowledged to be one of the chief determinants of user interface [111, 8]. Response time in direct manipulation systems is especially important as noticeable delays destroy the feeling of directness. The recognition results must be seen by the user immediately after a stroke is drawn.

3.1. PROBLEM ANALYSIS

39

2. Recognition must be activated automatically. This means that the recognition system should start the recognition process after the user has drawn a stroke, without any explicit demands of user's commands. A dialog style such as "draw a gest ure and then dick a command button to recognize this gesture" reduces the directness of gestural intedace and is therefore not acceptable. 3. Multiple-stroke gest ures must be recognized. The user usually sketches in a multiple-stroke style as that used with paper and pen, which should be supported to make the handsketch-based user intedace more natural. Further, some symbols can't be drawn in a single stroke, that means, multiple-stroke recognition is necessary on using such symbols as gest ure. 4. Recognition should be robust and tolerant. The recognition rate in an on-line recognition system is strongly dependent on the care of the user's drawing. However, it is also important in conceptual design to allow the designer making hasty sketches which can still be recognized by using the underlying diagram syntax. 5. The recognizer should be versatile and extensible. The recognizer is designed for a dass of diagram languages not for a single diagram language with a fixed set of symbols. Therefore the recognizer should provide on the one hand a large set of basic geometrical symbols, and on the other hand it should be easily extensible for new symbols which are not in this basic set.

3.1.2

Input of Handsketches

The input device used for a gestural intedace can be a digitizer pen or a mouse which generates point coordinates of each handsketch. Stroke and inking are the two most important terms frequently used within gestural intedaces. A stroke is the drawing from pen down to pen up, which is originally represented by a sequence of digitized point coordinates. Inking is a technique widely used to immediately show the digitized data to simulate paper and pen. This can be done in two different styles: • draw each digitized point, • draw a polygon whose vertices are the digitized points.

40

CHAPTER 3. LOW-LEVEL RECOGNITION

Figure 3.1 is a screen dump of the inking of a handdrawn rectangle by using the second inking technique. The white breaking points are the digitized points. Figure 3.2 shows the captured x-y coordinates of this handsketched rectangle which is drawn in a single stroke. Twenty points are digitized by this handsketched rectangle. Parameters such as a minimum distance between two adjacent points can be defined as an input filter to avoid receiving irrelevant points. For example, in using digitizers which send point coordinates regardless of the pen movement, the input filter is necessary to omit coordinates received which belong to the same point.

';---, I I \ I

L----,)

Figure 3.1: A handsketched rectangle x

y

0 1

184 194

389 390

2 3

222 254

390 391

4 5 6 7

272 269 266 267

393 371 343 323

8

267

317

9 10

258 226

314 314

11 12 13 14

196 184 184 186

313 311 317 333

15

189 191

353

Ir.

16 17

192

18

192

19

190

-- pan dovn

369 375 382 386

-- pen up

Figure 3.2: The point coordinates of a handdrawn rectangle stroke

3.1. PROBLEM ANALYSIS

3.1.3

41

Specific Properties and Problems

Compared to off-line picture recognition, on-line pattern recognition has the advantage that the segmentation is not a problem because segmentation can be done automatically while digitizing. Handsketches are segmented usually in strokes by using the pen-down and pen-up information. Strokes are stored in arrays of x-y coordinates in the same order as they are digitized. However, the representation form of arrays of x-y coordinates brings several problems which are not present in applications of off-line picture recognition. These problems appear both in single-stroke drawings and multiple-stroke drawings.

3.1.3.1

Single-Stroke

One single-stroke recognition problem is that there are many different possibilities for drawing a certain geometrical figure. Figure 3.3 illustrates this problem by giving With the restriction that the start examples of different single-stroke rectangles.

DDDDDDDD Figure 3.3: The eight variants to a single stroke rectangle

point must be at one of the four vertices of the rectangle, there are eight variants to draw the same rectangle by choosing different drawing directions and start points.

DD Figure 3.4: Start at the edge to draw a rectangle.

If allowed to start at any position as shown in figure 3.4, the number of drawing variations is unlimited. In the representation form of on-line captured arrays of the

42

CHAPTER 3. LOW-LEVEL RECOGNITION

x-y coordinates, these are all different "patterns" which makes the recognition task quite complex.

3.1.3.2

Multiple-Stroke

Most geometrical figures are usually drawn by using several strokes instead of a single stroke. For example, even if giving the restriction that an edge of a rectangle should be drawn in a single stroke, the same reet angle ean be drawn in five different combination forms as shown in figure 3.5.

1 stroke

2 strokes

3 strokes

4 strokes

Figure 3.5: There are five stroke eombinations of a multiple-stroke reet angle.

Unlike eharaeters, geometrieal figures do not have well-defined stroke orders. Stroke-orders are often used to reduee the eomputational eosts in on-line eharacter reeognition, partieularly in reeognizing Chinese eharacters. In eontrast, an on-line geometry reeognizer must be able to handle all possible stroke-orders for a given figure. The temporal information of stroke-order, whieh brings benefits for on-line eharacter reeognition, is rather a problem in reeognizing graphical symbols. In figure 3.3 and 3.4 several variants were illustrated for drawing a single-stroke rectangle. Different start positions and different drawing directions generate different patterns. There are mueh more variants for drawing a multiple-stroke rectangle than for drawing a single-stroke rectangle. In ease of drawing a four-strokes rectangle by eonsidering the stroke-order and the drawing directions of eaeh stroke, there are 16 x 4! = 384 different drawing possibilities. Together with five different stroke eombinations, there are several hundreds of "patterns" for rectangles. Figure 3.6 gives four reasonable examples among them. The examples above deal exclusively with rectangles. It is clear that the same problem exists in drawing other geometrieal figures as weIl. Figure 3.7 illustrates this by giving a few other examples.

43

3.2. RELATED PROBLEMS

'0 IJ IJ l~]' 234

Figure 3.6: Considering stroke-order and drawing directions This multiple-stroke problem is one of the most difficult problems of on-line pattern recognition. Most of the existing gest ure recognition systems provide only single-stroke recognition. Other systems consider this problem in the gestural dialog manager [73]. In systems which do not provide immediate response such as in [134], multiple-stroke problems are treated as follows: At the input level, strokes are collected until the user exceeds a time-out threshold between strokes. The resulting set of strokes, which must represent a single symbol, is sent to an image recognizer, However, such a system cannot recognize e.g. a neural net, for classification. graphical attributes of a symbol. 2

U C~ 3

4

Figure 3.7: Variants of drawing various geometrical figures

3.2 3.2.1

Related Problems Overview

Pattern recognition has a long history, many methods have been developed for various applications. Because of its practical importance, pattern recognition has been a very active field. Various approaches to different applications are developed. The most important system condition of a pattern recognizer is the form of the input pattern and how these patterns are captured. Pavlidis [95] distinguishes four classes of pictures: full gray scale and color pictures, bilevel pictures, continuous curves

44

CHAPTER 3. LOW-LEVEL RECOGNITION

and lines, and points or polygons. The forms of pictorial data introduce pattern recognition applications into two research fields: on-line and off-line recognition.

Off-line Recognition Traditionally, pattern recognition considers mainly the offline recognition problem, that is, the automatic construction of symbolic descriptions for pictures which are color pictures or bilevel pictures. The input is usually scanned with all the background information. Segmentation, contour tracing, thinning or scene analysis are the main problems within off-line pattern recognition. One of the most difficult problem of off-line drawing recognition is the segment at ion problem, that is, the scanned data must be converted to line drawings. This requires costly and imperfect preprocessing to extract contours and to thin or skeletonize them. Typical examples for off-li ne pattern recognitions are: dassification of OCR characters [12], recognition of handdrawn schematic diagrams [93] and mechanical drawings [52]. The term off-line signifies that the recognition is performed after the picture is created.

On-line Recognition The technological developments of the last years have made computer graphics popular. Together with the growing performance of computers, on-line pattern recognition has been become interesting. In contrast to off-line recognition, on-line recognition means that the machine recognizes handsketches while the user is drawing. The input data are vectors of x-y coordinates obtained in real-time from a digitizer pen or a mouse. Therefore, on-line recognition deals with pictures of the points dasses. "Electronic ink" displays the trace of the drawing on the screen, and recognition algorithms instantly convert the coordinate data into appropriate symbolic descriptions. Recently, research in on-line pattern recognition has focused on handwritten characters [59, 28, 122], excepting some research into gesture recognition [75, 73, 107]. These recognition problems are all on-line recognition problems which consider a common picture dass of points or polygons. The input patterns in on-line recognition are structured in strokes which vary in both their static and dynamic properties. Static variation can occur, for example, in size or shape. Dynamic variation can occur in stroke number and order. Common characteristics can mislead to use methods for character recognition directly to recognize handsketches. However, through detailed analysis of these problems, it is confirmed that they are quite different problems. As mentioned in

3.2. RELATED PROBLEMS

45

[8], one of the problems with on-line pattern recognition is that we tend to lump all of the different approaches together. In fact, there is probably as much stylistic difference between a system that recognizes blockprinted characters and one that recognizes proofreading gest ures as there is between a menu system and one that uses direct manipulation. Although there are many similarities between character recognition, irregular gesture recognition, and geometry recognition, the methods for character recognition and irregular gest ure recognition cannot be used directly to recognize handsketched geometrical figures. This is because there are several significant distinctions between these recognition problems, each of which relates to different requirements. We want point out what are the common features and where are the main differences to prepare our design decisions.

3.2.2

Character Recognition

Character recognition has a long history, and a number of character recognition systems have been developed. The state of the art of on-line character recognition is surveyed in [122]. The advent of electronic tablets in the late 1950's precipitated considerable activity in on-line handwriting recognition. This intense activity lasted through the 1960's, ebbed in the 1970's, was renewed in the 1980's, and has become popular now. The renewed interest in on-line character recognition sterns from the advent of notepad computers. Pattern Classification and Pattern Analysis Niemann [89] states that pattern recognition comprises dassification of simple patterns and analysis of complex patterns. A pattern is considered to be simple if a dass name is sufficient, and dassification means that each pattern is considered as one entity and put into one dass out of a limited number of dasses. No quantitative characterization is attempted in dassification. A pattern is considered to be complex if a dass name is not sufficient. Analysis of a pattern means that an individual description of each pattern is given. Therefore, quantitative characterization is usually necessary in pattern analysis. Handsketched strokes are complex patterns because a dass name is not sufficient and the dassification of a stroke as a whole is not feasible. Strokes differ from each other not only in dass names but also in quantitative characterizations such as positions and dimensions as discussed in section 3.1. It is apparent that there is an overlap between pattern dassification and pattern analysis.

CHAPTER 3. LOW-LEVEL RECOGNITION

46

The most important reason why character recognition methods cannot be used directly in geometry recognition is:

Character recognition is only classification. Geometry recognition is not only classification but also analysis of graphical attributes. In contrast to character recognition, it is not enough just to recognize the dass which a handsketched geometrical figure belongs to, the graphical attributes must be recognized as weIl. Therefore, a geometry recognizer should recognize the shapetype of a handsketch, and at the same time it must identify all graphical attributes such as size and position of feature points. This is because all of this information are used to define a visual language syntax. The relationship between the geometrical objects, the size and shape of them have both syntactical and semantic meanings. Considering the handsketched rectangle shown in figure 3.1, a geometry recognizer should output at least the following information: 1. it is a rectangle 2. this rectangle is upright. 3. the coordinates of the low-Ieft and the upper-right vertices of this rectangle are (184, 311) and (272, 393). Many efforts in character recognition are faced with the problem of shape discrimination between characters that look alike such as U-V, C-L, a-d, n-h, 0-0, 1-1, Z-2 [122J. Another difficult problem in character recognition is cursive writing recognition [121, 33J. Recently, new research results show that neural nets have excellent performance on solving such problems [96J. But neural nets cannot output information such as vertices-coordinates of geometrical figures. They are good in dassification but poor on analysis. While character recognition treats a large set of characters, geometry recognition considers a relative small number of essentially different geometrical shapes but with an infinite number of hierarchical parameter variations. In contrast to character recognition, the main characteristic of geometry recognition is to identify feature points of handsketched figures.

47

3.2. RELATED PROBLEMS

3.2.3

Irregular Gesture Recognition

Gestures for editing diagrams are regular geometrical figures like rectangles, circles or lines. Such gestures are called regular gestures. We use the term irregular gesture to indicate hand markings which do not have regular shapes compared to geometrical figures. Such gestures are usually designed only for certain applications. Figure 3.8 illustrates four examples of such gestures: (a) delete-gesture used in [73], (b) ellipse-gesture in [107], (c) merge-paragraph-gesture in [56], and (d) temple-roofgesture in [75]. These gestures have properties that are different both from those

(a)

(b)

(c)

(d)

Figure 3.8: Some examples of irregular gest ures

of handwritten characters and from handdrawn geometrical symbols. While most handwritten characters have regular heights and orientations, gest ures do not. Further, these gest ures differ from regular geometrical figures in that they are freehand drawings and their features cannot be described with simple functions. For example, the proofreader's editing gesture used in [73] for delete-gesture (figure 3.8 a) which is a loop with a beginning and ending tail, can differ in size rotation, and mirror image. Compared to character recognition, there are only a small number of investigations to gesture recognition. To give a feeling ab out the relationship between efforts in character recognition and gesture recognition: in the most comprehensive survey [122] of on-line recognition, there are only 2 papers devoted to gesture recognition and over 200 papers about character recognition. Recent research results [73, 107] indicate that trainable gesture recognizers are successful for the recognition of irregular gestures. These gesture recognizers are designed only for single-stroke gest ures , they do not satisfy our requirements of

48

CHAPTER 3. LOW-LEVEL RECOGNITION

multiple-stroke sketching facilities. On the other hand, the low-Ievel recognizer for handsketch-based diagram editing must provide a recognizer which works without training, because trained recognizers are usually person-dedicated and the training takes long time. The same diagram editor for conceptual design can be used by different users. Further , the drawing directions in irregular gest ures usually have meanings. In contrast, drawing directions in our regular geometrical figures are invariants of the same figure. Our low-Ievel recognizer must support all drawing styles which are discussed in section 3.1.3 to fulfill our requirements.

3.3

Fundamental Concepts

In the light of the above analysis of our specific recognition problem and the related recognition problems, we are now able to consider our requirements to make own design decisions. As discussed in the last section, character recognizers and irregular gest ure recognizers treat related recognition problems with different properties. Character recognition systems and irregular gesture recognizers are pattern classification systems which cannot recognize all graphical attributes. The methods developed for on-line recognition of characters or irregular gest ures cannot be used directly in geometry recognition. For this reason, we designed a hierarchical and incremental recognition method. This section describes the fundamental concepts, the major design considerations and design decisions of the low-Ievel recognizer.

3.3.1

Hierarchical Classification

In the requirements discussed in the introduction of this chapter, the response time has been clearly stated as one of the most important criteria for the acceptance of a gest ure recognizer for interactive applications. Matching strategy is the most crucial factor which influences the response time. The matching strategy of existing on-line pattern recognition systems normally matches an input pattern with all standard patterns (prototypes) in a dictionary (table) by using some appropriate measurements or calculations. The distance of measurement values between input pattern and standard symbols in the dictionary can be calculated in different manners. Examples of recognition systems which use this strategy are [82, 121, 107]. Therefore, matching is based on different measurements and calculations by mini-

49

3.3. FUNDAMENTAL CONCEPTS

mizing or maximizing evaluation functions. In [82], the Euclidean distance between prototypes and normed patterns is used, in [121] a cumulative distances of angle and height differences are used. Dean Rubine's method maximize a linear evaluation function over 13 feature calculations. The dassifier simply determines the dass for which the evaluation function gives a maximum value. For a generalized low-Ievel recognizer which must recognize all reasonable handsketched geometrical figures, this matching strategy results in an enormous number of distance calculations which makes immediate response difficult. Geometrical objects have a dear and intuitive hierarchy. For example, cirde could be a sub dass of ellipse, or rectangle could be a subdass of parallelogram. This inherent hierarchical nature of geometrical figures can be used to make the classification of handsketched figures more efficient. At first, some broad classes are distinguished. In the next step each of the broad classes is further subdivided. Subdivision continues until a final classification is obtained. Symbol

Dot



~

Line

T,Ä91e

;>,,,,

Trapezoid

~

Quadrilateral

Are

r

MultiLine

~

~

o

Circle

TriLine

Parallelogram

~

I

Rectangle

c:::J

I

Sef

SharpArrow

4

RightAngle

A

I L

·z·

·U· Form

z U Fo=

"Bottle" Form

~

"Basin" Form

\..../

"L"Form

Figure 3.9: The hierarchy of geometrical objects

Figure 3.9 shows the currently used symbol hierarchy. Each new level in this hierarchy is defined by emphasizing the existence of a special feature. For example, if two adjacent edges of a rectangle are equal, this rectangle is a square. In general,

50

CHAPTER 3. LOW-LEVEL RECOGNITION

all geometrical objects have simple definitions which can be used as matching criteria to dassify them. In figure 3.9, the characteristics of each dass are illustrated by giving examplefigures below the dass-name. For different diagram types, this dass-hierarchy can be redefined for the benefit of the recognition task. Experience shows that the system works better if the significant dassifications are made at the top level of the hierarchy. For diagrams which mainly distinguish between nodes and connections between nodes, we found that the hierarchy which first distinguishes between opened strokes and dosed strokes works best. For other diagrams, the hierarchy can distinguish lineoriented geometrical objects from arc-oriented geometrical objects at the first level of the hierarchy. Although there are many possibilities to define this hierarchy, the recognition method is designed to be independent of a concrete hierarchy. Further , each defined hierarchy can be even adapted or extended for specific applications. The method of hierarchical classification is efficient because only necessary calculations are made for local dassifications. A complex recognition problem is separated into severallayers, and in one layer, only a limited number of different dasses is considered. Hierarchical dassification benefits significantly by using object-oriented techniques which leads to the next design consideration.

3.3.2

Object-Oriented Design

The object-oriented programming paradigm has evolved as a methodology to make quality programming less burdensome. It has been predominantly treated as a technique for programmers. Recently, the object-oriented paradigm is viewed more and more as a way of thinking and doing rat her than simply as a way of programming [29]. Object-oriented technology is used in the low-Ievel recognizer not only for the implementation of the recognition system, more important, it is used to design the system. Within our object-oriented design, the dassification-hierarchy is used directly as the dass-hierarchy. A concrete design of such a hierarchy must satisfy the following two basic requirements: 1. The root dass of this hierarchy is the class for original unknown symbols. 2. Each child-class in this class-hierarchy is defined by specifying features which

3.3. FUNDAMENTAL CONCEPTS

51

do not exist in its parent dass. The inherent hierarchical nature of geometry is often used as a standard example to capture features such as inheritance and polymorphism in object-oriented programming as in [41]. The text book example for polymorphism is the definition of a virtual graphical output function draw for each geometry dass [11]. In this work, the hierarchical nature of geometrical objects is used not for drawing graphics but for recognizing handdrawn graphics to get the geometrical data of the handdrawn graphics. It is shown in [146] that the object-oriented programming is very powerful in applications of on-line recognition of handdrawn graphics. Using object-oriented design, this complicated pattern recognition problem of handsketched geometrical figures achieves an elegant solution. The main object-oriented aspects which strongly influence our design of the low-Ievel recognizer are encapsulation, polymorphism, and reuse. 3.3.2.1

Encapsulation

One of the most significant features of the object-oriented methodology is encapsulation, that is, the packaging technique. As discussed in section 3.2, on-line geometry recognition consists of both pattern dassification and pattern analysis. We use the encapsulation technique to bring pattern dassification and pattern analysis together into objects. To encapsulate originally unknown strokes and all recognizable geometrical figures into objects of appropriate dasses requires the careful design of each individual dass in the hierarchy. Each dass in the dass-hierarchy provides data structures for storing attributes which are relevant for this dass . • The root dass Symbol is the most general dass, and it provides most general data structures for stroke objects which are represented by x-y coordinates . • All other dasses inherit the basic attributes of the root dass, and they provide additional data structures for storing dass-specific attributes. In the low-Ievel recognition, input strokes and all recognizable symbols are encapsulated into dasses in a dass-hierarchy of symbols. The dass name of each output object represents the dassification result, and the graphical attributes of

52

CHAPTER 3. LOW-LEVEL RECOGNITION

eaeh objeet represent the analysis result. The class-hierarehy corresponds to the classification-hierarchy. 3.3.2.2

Polymorphism

A polymorphie function is one that can be applied uniformly to a variety of objects. Class inheritance is closely related to polymorphism. The same operations that apply to instances of a parent class also apply to instances of its subclasses. This property is used to design a uniform classifieation function for all classes. Each classification function can be accessed by a eorresponding object . • All dasses provide a uniform matching-routine whieh determines whether an object of this class can be an object of its children's dass . • The matching-eriteria used for eorresponding matching-routines are exactly the definition which distinguishes the child dasses from the parent dass. Within this object-oriented design, the recognition process is treated as an objectrefinement process. Input strokes are eneapsulated into objects. Each object recognizes the next possible subclass in the class-hierarchy and creates a new object of that more specific class. The key point here is that the object recognizes itselj which is supported by the polymorphism concept of the object-oriented technology. All classes in the class-hierarchy specify a virtual recognition function with a unique function name. In this way, the object refinement can be easily controlled by iteratively calling each object to recognize itself [146]. 3.3.2.3

Reuse

Reuse of object-oriented technology is represented in the low-level recognizer in two aspects. First, the recognition of multiple-stroke sketches reuse the single-stroke analyzer by merging multiple-stroke symbols into objects which can be considered as a single rough sketch. Second, subclasses in the class-hierarchy inherit graphical attributes and recognition functions of all their superdasses. That is, a sub dass reuses the recognition functionality of its superclasses automatically. This has the advantage that a new symbol can easily be extended in the symbol hierarchy by reusing the recognition functionality of its superclasses.

3.3. FUNDAMENTAL CONCEPTS

3.3.3

53

Incremental Recognition

With any user interface, there must be some signal by which the computer knows that the user has completed a command which should now be executed. This signal is called a closure, and it may be an explicit event such as a button press or returnkey, or it may be automatically recognized. Rhyne [101] first discussed this closureproblem within gestural interfaces. In principle, one would like to avoid the need for explicit closure actions, as they consume time and destroy the directness. For this reason, only the very early approaches, e.g. [101] use a closure button, most existing gesture-based systems, e.g. [134], determine the closure by using time information. This kind of interaction forces the following style: First, the user draws something. Then he stops drawing for a while to indicate to the system that he has finished drawing the objects. The system waits for a predefined time period during which no coordinates have been sent to the recognizer, a recognition process can then be started. This interaction style is not suitable in diagram editing for conceptual sketching because pauses destroy the thinking process. Using pauses additionally for closure signals, the user feels unpleasant while sketching. To give the user a better feeling about the directness of the gestural interface, the following design decision is made: On the one hand, we simulate paper and pen by using inking. The user sees what he has just drawn in the same style like working with paper and pen. On the other hand, we improve the paper and pen style by automatic redisplaying each handdrawn stroke with the recognized graphical object, that means, after receiving the pen-up signal, this stroke will be recognized immediately, without the need of pauses. Moreover the recognition result is displayed directly after the stroke is recognized. In this way the user sees immediately the recognized stroke and gets a direct feeling of the underlying gestural interface. The current recognition results are always visible, because the handdrawn stroke is beautified by a regular geometrical object which gives a better display than inking. Inking is only used while the user is drawing. Additional sketchings cau be made more precisely, because the previously drawn strokes were already recognized as geometrical objects which are displayed regularly. This has the additional advantage that the user can correct wrong recognitions as early as possible. The design decision for immediate recognition of each stroke makes it difficult to determine closures of multiple-stroke gestures. This is especially true when two

54

CHAPTER 3. LOW-LEVEL RECOGNITION

shapes differ in form only by the addition of one or more strokes. The idea for classifying the gest ure recognition in two tightly cooperative levels is essential to make it possible that the high-level recognizer helps the low-Ievel recognizer to recognize closures without any user's actions. The communication between low-Ievel and high-level recognizers is supported by a database of geometrical objects. After the low-Ievel recognition process is terminated, the high-level recognizer is triggered to look in the database for syntactically correct geometrical objects. Objects which compose a syntactically correct editing command will be removed from the database. In other words, all geometrical objects, which are stored temporally in the database, are syntactically incorrect. This incorrectness is mainly incomplete, and incomplete strokes can be successively completed by drawing additional strokes. This leads to an incremental recognition of multiple-stroke sketches with the following two key points: 1. A database buffers incomplete strokes until they become a syntactical correct sketch. 2. The system incrementally merges all connected and incomplete strokes into new objects. New merged objects are made immediately visible to improve the directness. The idea of incremental merging provides a very powerful mechanism to drastically simplify the recognition efforts for stroke-order-independent multiple-stroke sketches. This is because logically connected strokes are successively merged into one object.

3.4 3.4.1

System Design Overview

A recognition system is characterized by its control structure and, for this control structure, an appropriate data structure. Figure 3.10 is a schematic of the system components and its organization of the low-Ievel recognizer. The low-Ievel recognizer consists of three modules and a database which is accessible from all modules. The

55

3.4. SYSTEM DESIGN

three software modules are the single-stroke analyzer, the incremental updater, and the selective matcher.

x-y

~

coordinates . . . - - - - - -... new stroke

...

_ _ _ _ _... graphical Symbol

Selective Matcher

Single-stroke Analyzer

look-up

merged object

i._- ------ -------- -- ----- ---

Incremental Updater

------....._-_.......

new object stored in database

low-Ievel terminated ------------------------------------------------~

data flow



control flow

._--------,..

Figure 3.10: System components and organization of low-level recognizer

The intelligence of our low-level recognizer is concentrated in a dass hierarchy which corresponds to the symbol hierarchy presented in section 3.3.1. One of the most significant features of our design is that this dass hierarchy of geometrical objects is the control structure of the recognition process, and at the same time it builds the storage and query structure of the database. The database is designed for hierarchical access of geometrical objects. Hierarchical access here refers to queries which can be expressed hierarchically, for example, select a geometrical object which belongs to the opened dass or child-dass of the opened dass. This kind of hierarchical accesses is used intensively by the incremental updater. 3.4.1.1

System Components

The single-stroke analyzer is the interface to the dialog manager. For each new stroke, the single-stroke analyzer creates a new object which encapsulates this stroke

CHAPTER 3. LOW-LEVEL RECOGNITION

56

which is originally represented by x-y coordinates. This object will be dassified along the defined dass hierarchy by successively analyzing features of this object. In contrast to the single-stroke analyzer, the incremental updateroperates on the relationships between two objects in the database. The main goal of the incremental updater is to merge two arbitrary objects which can be connected and represented by a single object. By iterative calls, it is possible to merge all objects that logically belong together, into one object. The selective match er is the interface to the high-level recognizer. It is mainly a hierarchical access module to the database of geometrical objects. Each syntactical correct object in the database will be selected and sent to the high-level recognizer by the selective matcher.

3.4.1.2

Control Flow

The control flow of the low-level recognizer is represented in figure 3.10 with dotted lines. Each time a new stroke is completely drawn, the single-stroke analyzer is activated. The termination of the recognition process of the single-stroke analyzer triggers the incremental updater. The incremental updater stops if all objects, which can be combined together, are merged and processed. The termination of the update process triggers the high-level recognizer which will be discussed in the next chapter. The control flow between database and other modules are not depicted in figure 3.10, as they are standard database operations such as storage, delete, and query of objects. The selective matcher can only be activated by the high-level recognizer.

3.4.1.3

Data Flow

The input data of the low-Ievel recognizer are digitized x-y coordinates. Sequences of x-y coordinates are segmented into individual strokes by pen-down and pen-up signals. The outcome of the single-stroke analyzer is an object representing the complete recognition results which include the dass and the attributes. The dass, this object belongs, is the recognized dass of the input stroke, and the graphical attributes of this object are the recognized feature points. This resulting-object will be stored in the central database of geometrical objects. Data exchanges between

3.4. SYSTEM DESIGN

57

the database and other modules are based on objects which are instances of classes in a class-hierarchy of geometrical figures. The incremental updater merges every two logically connected objects into one object, and sends the merged object as a single stroke to the single-stroke analyzer. The old objects are deleted after the merging. Data exchanges between low-Ievel recognizer and high-level recognizer are achieved by the selective matcher which takes objects from the database and brings them to the high-level recognizer. The output from the selective matcher are geometrical objects which can be used by the high-level recognizer. The following three sections describes the low-Ievel recognizer in detail.

3.4.2

Symbol Database

One of the most important system components of the low-Ievel recognizer is the central database. This database is used for storing intermediate recognition results which are shared by all the three modules: the single-stroke analyzer, the incremental updater, and the selective matcher. As mentioned above, all intermediate recognition results are objects of classes in our class-hierarchy. These objects have a hierarchical kind-oJ relationship among them. For example, a square-object is a kind of rectangle-object. This hierarchical relationship is important for the incremental updater as weIl as for the selective matcher. The incremental updater has to check connectivities between objects which belong to the dass opened and all its subdasses. The selective matcher accesses objects in the same way. The key point here is the so-called hierarchical query, that is, searching for objects which belong to a specific dass 01' objects which belong to the child-dasses of this specific dass. This can be seen more dearly in a concrete example. In a Petri net editor, a rectangle is a gesture for "create transitions." If a square is drawn, a transition object must be created, because a square is a kind of rectangle. A database, which allows hierarchical retrieval of objects, is designed by using the class-hierarchy as the data organization structure in the database. A container class is designed to manage instances of classes in the dass-hierarchy. The lowlevel recognizer automatically creates a container-object for each dass in the classhierarchy. These container-objects are then connected in the same structure as the dass-hierarchy shown in figure 3.11. Instances of the same dass are stored in a list which is accessible from the corresponding container-object. Additional information such as the number of all children objects which are instances of its subdasses

58

CHAPTER 3. LOW-LEVEL RECOGNITION

Class:

Symbol

#Chlldren : 4 lhls_Ctass_ObJects:

Class:

nll

Closed

#Chlldren: lhls_Class_ObJects:

Class :

nll

Ellipse

#Chlldren: 0 Thls_Class_ObJects:

nll

Figure 3.11: Internal structure of the specific database for geometrical objects are stored in each container-object. A hierarchical query is therefore a top-downoriented search from an entry container-object by looking for a non-empty list of object-instances. In this way, a hierarchical query of objects stored in the database is achieved.

3.4.3

Single-Stroke Analyzer

The single-stroke analyzer transforms a single handsketched stroke into a geometrical object. Handsketched figure recognition requires pattern classification and pattern analysis. One of the novelties of our single-stroke analyzer is that classification and analysis are combined in a natural way by using an object-oriented method.

3.4. SYSTEM DESIGN 3.4.3.1

59

Control Structure

Single-stroke analysis may be viewed as a problem-solving activity which comprises both pattern dassification and pattern analysis. The initial state of the problem is defined by the original stroke. By a sequence of actions, the initial state undergoes a sequence of state transitions which leads to a sequence of new states. The singlestroke analyzer stops if no furt her action is possible, and the "stop-state" is the goal state. One significant characteristic of the low-level recognizer is that the control structure of the single-stroke analyzer is exactly the dass-hierarchy. The dass-hierarchy can be seen as a schema for a top-down problem-solving strategy, as wen as a knowledge base which reduces the searching space in the problem-solving activity. The root dass represents the initial state, and all other dasses represent goal states. This differs from conventional problem-solving-trees such as decision trees which consider only leafs as goal states [89]. As mentioned above, each dass in the dass-hierarchy provides data structures for encapsulating strokes and geometrical objects, and each dass has an analyzefunction for local recognition. Local means that only one level of the hierarchical dassification is considered. Within object-oriented design by making use of the polymorphism, the control structure of the single stroke recognition is just a simple loop as shown in figure 3.12. Firstly, each new stroke is encapsulated in an object of the root dass Symbol. We call this object the working-object. Subsequently, the recognition process carries on by calling the uniform and polymorphie analyze-function for this working-object in a loop. Each analyze-function returns a new object which represents the local recognition result of the working-object. Within the loop, the new object is assigned to the working-object in case that they are different, that is, the working-object is specialized. In this way, the working-object is recognized step by step. The dass and the attributes of this working-object changes top-down along the dass-hierarchy. The recognition result is represented by the working-object after the termination in case that the working-object cannot be specialized any more. The essential point here is that all dasses in the dass-hierarchy have a uniform analyze-function. This polymorphie analyze-function is a local decision maker which determines whether an object of this dass can also be an object of a more specific

60

CHAPTER 3. LOW-LEVEL RECOGNITION

Capture a slJOke. crea1e a working-object of cIass Syrilol

AssilJ! the

SpeciaJlzing 1I1e

working-object to anewobject

new objecIlO Ihe working--object

Yes

Return the working-object

as recognltlon resutt

Figure 3.12: The control structure of the single-stroke analyzer dass, that is, whether this object can be furt her "specialized". All analyze-functions comprise both pattern dassification and pattern analysis. The dassification is characterized by the difference between the dass-name of the object itself and the dassname of the returned object which represents the recognition result. The pattern analysis is characterized by the transformation of object-attributes. Because this function is a polymorphie function, it can be used for all objects which are instances of different dasses in the dass-hierarchy. In the recognition loop, each working-object calls this function for itself, and the returned object of this function is assigned to this working-object again.

An Example The best way to illustrate the recognition process is by example. For this reason, we consider the handsketched rectangle which was discussed in section 3.1(figure 3.1) as an input for our single-stroke analyzer. Figure 3.13 illustrates the recognition process by giving the intermediate steps which are represented by the workingobjects created during the recognition process.

61

3.4. SYSTEM DESIGN

DDDDDDD Symbol

Closed

Polygon

Quadrilateral

Parallelogram

Rectangle

Square

Figure 3.13: The recognition process of a single-stroke square First, the original stroke is encapsulated in an object of the root dass Symbol. The analyze-function of dass Symbol finds out that this object belongs to dass Closed, and an instance of the dass Closed is then created and returned. This step indudes both pattern dassification and pattern analysis. The working-object is dassified from a general dass Symbol to the more specific dass Closed. Moreover, the specification data of the working-object is analyzed. The start point and the stop point of the working-object is characterized in the new Closed-object by a single point. This new object is then assigned to the working-object for further recognition. The recognition process carries on with this new working-object. The working-object is subsequently recognized as a Polygon-object, and then a Quadrilateral-object, a Parallelogram-object, a Rectangle-object, and finally a Square-object which represents the recognition result.

3.4.3.2

Feature Selection and Feature Analysis

Given the control structure, the design of the single-stroke analyzer is the design of the analyze-functions for each dass. These functions are all based on feature selection and feature analysis. One of the advantages of hierarchical dassification is that a complex recognition problem is separated into many simple recognition problems, so that most of the analyze-functions are easy to design. For example, the analyze-function for the dass Polygon considers the number of vertices of a Polygon-object. A polygon with three vertices is a triangle, and a polygon with four vertices is a quadrilateral. Other functions consider quantitative properties of an object such as the slope angle of a line or distances between feature points.

62

CHAPTER 3. LOW-LEVEL RECOGNITION

Various thresholds are used for making decisions. These threshold-values are defined dynamically, dependent on some other object-specific properties, so that reasonable dassification can be made. This is similar to approaches with fuzzy logic [133]. For example, the analyze-function for the dass Symbol determines whether the start point of a stroke is dose to the stop point. It is not feasible to use a fixed threshold to determine, if the distance between the two points is less than this value it is a dosed object, otherwise it is an opened object. The doseness here is vagueness and ambiguity. Consider a big cirde-stroke with a radius of about 100 pixels, if the distance between the start point and the stop point of this stroke is 20 pixels, it is reasonable to dassify this stroke as a dosed stroke. But if the same distance is considered for a small circle-stroke, for example, a radius of about 30 pixels, the stroke should be classified as an opened stroke. Additional to these forms of simple feature analyses improved by the use of fuzzy logic, corner detector, line detector, and arc detector are more difficult, which need furt her discussions. First, we consider the representation forms of a stroke object.

Stroke Representation Forms Each coordinate pair defines a point Pi with its x- and y-coordinate:

Pi:= (Xi,Yi)

The sequence of these coordinate pairs builds a list P of coordinate pairs, which is the original input for furt her recognition tasks:

P := {pi I 1 :::; i :::; n} This list of coordinates is used as object-attributes of the root class Symbol, all other classes in the class-hierarchy inherit this list as the basic representation form of all stroke objects. Each point (except the last one) together with its next point defines an angle which represents the direction of the drawing at that point.

ai

3.4. SYSTEM DESIGN

a··• .-

63

==:) ,

if Yi+! ardan (::+~ 360 + ardan ~ ~~-~ Yi±l -Yi ) , if Yi+! 180 + ardan Yi±l -Yi if Yi+! Xi+l- X i ' 180 - ardan Yi±l -Yi if Yi+! Xi+l- X i '

1

l

~ Yi and Xi+! ~ Xi