Computer Science and Operations Research: New Developments in Their Interfaces 0080408060, 9780080408064

The interface of Operation Research and Computer Science - although elusive to a precise definition - has been a fertile

536 33 55MB

English Pages 548 [549] Year 1992

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Computer Science and Operations Research: New Developments in Their Interfaces
 0080408060, 9780080408064

Table of contents :
Front Cover
Computer Science and Operations Research: New Developments in Their Interfaces
Copyright Page
Table of Contents
Preface
Referees
PART I: OPTIMIZATION TECHNIQUES
Chapter 1. A Principled Approach to Solving Complex Discrete Optimization Problems
Abstract
Keywords
INTRODUCTION AND BACKGROUND
DESCRIPTION OF OPL
A BENCHMARK OF OPL ALGORITHMS
EXTENDING AND MERGING PROBLEMS
OTHER OPL APPLICATIONS
CONCLUSIONS
BIBLIOGRAPHY
CHAPTER 2. BOOLEAN-COMBINATORIAL BOUNDING OF MAXIMUM 2-SATISFIABILITY
ABSTRACT
KEYWORDS
1. Maximum 2-satisfiability problems and their reducibility
2. Connections with roof-duality
3. Elementary boolean operations
4. Squeezing out constants by fusions and exchanges
5. Equivalence between reducibility and squeezability
6. Improving the lower bound by consensus and condensations
REFERENCES
CHAPTER 3. NAVAL PERSONNEL ASSIGNMENT: AN APPLICATION OF LINEAR-QUADRATIC PENALTY METHODS
ABSTRACT
KEYWORDS
INTRODUCTION
THE NAVAL PERSONNEL ASSIGNMENT PROBLEM
APPLICATION OF THE LINEAR-QUADRATIC PENALTY METHOD TO NAVAL PERSONNEL ASSIGNMENT
NUMERICAL RESULTS
CONCLUSIONS
REFERENCES
Chapter 4. Preprocessing Schemes and a Solution Method for the Convex Hull Problem in Multidimensional Space
ABSTRACT
KEY WORDS
INTRODUCTION
1. BACKGROUND
2. DEFINITIONS, NOTATION AND IMPORTANT TERMS
3. FIRST STAGE: PREPROCESSING
4. SECOND STAGE: RESOLUTION
5. THE FRANK-WOLFE ALGORITHM FOR FINDING PROJECTIONS
6. IMPLEMENTATION AND COMPUTATIONAL RESULTS
7. CONCLUSIONS
REFERENCES
PART II: LINEAR PROGRAMMING INTERIOR POINT ALGORITHMS
CHAPTER 5. ADAPTING THE INTERIOR POINT METHOD FOR THE SOLUTION OF LINEAR PROGRAMS ON HIGH PERFORMANCE COMPUTERS
ABSTRACT
KEY WORDS
HARDWARE PLATFORMS FOR THE SPARSE SIMPLEX AND THE INTERIOR POINT METHOD
CHOICE OF INTERIOR POINT METHOD
PARALLEL SSPD SOLVER KERNEL ON A DISTRIBUTED MEMORY COMPUTER
THE SSPD SOLVER KERNEL ON THE DAP COMPUTER
DISCUSSION AND CONCLUSIONS
ACKNOWLEDGEMENTS
REFERENCES
CHAPTER 6. IMPLEMENTATION OF AN INTERIOR POINT LP ALGORITHM ON A SHARED-MEMORY VECTOR MULTIPROCESSOR
ABSTRACT
KEYWORDS
INTRODUCTION
A PRIMAL-DUAL LP ALGORITHM
THE CHOLESKY DECOMPOSITION
IMPLEMENTATION
COMPUTATIONAL EXPERIENCE
CONCLUSIONS
ACKNOWLEDGEMENTS
REFERENCES
PART III: NETWORKS
CHAPTER 7. ALTERNATE SERVER DISCIPLINES FOR MOBILE-SERVERS ON A CONGESTED NETWORK
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. FORMULATION
3. SERVER DISCIPLINES
4. RELATED THEORY
5. CONCLUSIONS
6. REFERENCES
CHAPTER 8. COLLISION DEPENDENT PERFORMANCE MODEL FOR A STAR TOPOLOGY LOCAL AREA NETWORK
ABSTRACT
KEYWORDS
INTRODUCTION
THE HUBNET PROTOCOL
FORMULATION OF THE COLLISION DEPENDENT MODEL
CONCLUSIONS
REFERENCES
CHAPTER 9. GREEDY RECOGNITION AND COLORING ALGORITHMS FOR INDIFFERENCE GRAPHS
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. BASICS
3. GREEDY ALGORITHMS
4. CONCLUSION AND OPEN PROBLEMS
ACKNOWLEDGEMENT
REFERENCES
CHAPTER 10. MINIMUM GRAPH VERTEX COVERING WITH THE RANDOM NEURAL NETWORK
ABSTRACT
KEYWORDS
INTRODUCTION
RANDOM NETWORK SOLUTION
CONCLUSIONS
REFERENCES
CHAPTER 11. MULTIPLE CLASS G-NETWORKS
ABSTRACT
KEYWORDS
INTRODUCTION
CUSTOMER FLOW EQUATIONS AND PRODUCT-FORM
CONCLUSIONS AND OPEN PROBLEMS
REFERENCES
CHAPTER 12. ON IMPLEMENTING AN ENVIRONMENT FOR INVESTIGATING NETWORK RELIABILITY
ABSTRACT
KEYWORDS
1. NETWORKS AND RELIABILITY
2. SYSTEM REQUIREMENTS
3. SOME EXAMPLES
4. EXACT COMPUTATIONS
5. SUMMARY OF IMPROVEMENTS
ACKNOWLEDGEMENTS
REFERENCES
PART IV: COMPUTER GRAPHICS IN OPERATIONS RESEARCH
CHAPTER 13. ANIMATED SENSITIVITY ANALYSIS
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. RELATED RESEARCH
3. EXAMPLES OF ANIMATED SENSITIVITY ANALYSIS
4. A FRAMEWORK FOR ANIMATED SENSITIVITY ANALYSIS
5. CONCLUSION AND RESEARCH DIRECTIONS
REFERENCES
CHAPTER 14. EDINET - A NETWORK EDITOR FOR TRANSSHIPMENT PROBLEMS WITH FACILITY LOCATION
ABSTRACT
KEY WORDS
INTRODUCTION
THE PROBLEM AND DATA MODEL
LOCAL WINDOWS
GLOBAL WINDOWS
CONCLUDING REMARKS
REFERENCES
CHAPTER 15. FUNCTIONAL DESCRIPTION OF A GRAPH-BASED INTERFACE FOR NETWORK MODELING (GIN)
ABSTRACT
KEYWORDS
INTRODUCTION
CURRENT MODELING ENVIRONMENTS
OVERALL SYSTEM DESIGN AND ARCHITECTURE
MODEL BUILDING WITH GIN
WINDOWS Icons
EXAMPLE PROBLEM
IMPLEMENTATION AND CURRENT STATUS
CONCLUSIONS
ACKNOWLEDGEMENTS
BIBLIOGRAPHY
CHAPTER 16. NETPAD: AN INTERACTIVE GRAPHICS SYSTEM FOR NETWORK MODELING AND OPTIMIZATION
ABSTRACT
KEYWORDS
OVERVIEW
NETPAD ENVIRONMENT
USING NETPAD
CUSTOMIZING NETPAD
NETPAD ARCHITECTURE
POTENTIAL USES OF NETPAD
SOME EXISTING SYSTEMS
NETPAD AVAILABILITY
ACKNOWLEDGMENT
PART V: PARALLEL ALGORITHMS AND IMPLEMENTATIONS
CHAPTER 17. A CONCURRENT COMPUTING ALGORITHM FOR REAL-TIME DECISION MAKING
ABSTRACT
KEYWORDS
INTRODUCTION
THE PRODUCTION SCHEDULING PROBLEM
THE GENERIC SCHEDULER
CONCLUSIONS AND FUTURE DIRECTIONS
ACKNOWLEDGEMENTS
REFEENCES
CHAPTER 18. COMPUTATIONAL EXPERIENCE WITH PARALLEL ALGORITHMS FOR SOLVING THE QUADRATIC ASSIGNMENT PROBLEM
ABSTRACT
KEYWORDS
INTRODUCTION
PARALLEL ALGORITHMS
COMPUTATIONAL RESULTS
REFERENCES
CHAPTER 19. ON REPORTING THE SPEEDUP OF PARALLEL ALGORITHMS: A SURVEY OF ISSUES AND EXPERTS
ABSTRACT
KEYWORDS
INTRODUCTION
REPORTING OF COMPUTATIONAL TESTING
ISSUES IN REPORTING THE RESULTS OF PARALLEL TESTING
A SURVEY OF EXPERTS
CONCLUSIONS
ACKNOWLEDGMENTS
REFERENCES
CHAPTER 20. OPTIMAL PARALLEL ALGORITHMS FOR COMPUTING A VERTEX OF THE LINEAR TRANSPORTATION POLYTOPE
ABSTRACT
KEYWORDS
INTRODUCTION
AN OPTIMAL SEQUENTIAL ALGORITHM
AN OPTIMAL PARALLELIZATION OF THE NORTHWEST-CORNER RULE FOR THE CREW PRAM
AN OPTIMAL PARALLEL SOLUTION TO THE VECTOR COMPOSITION PROBLEM ON THE EREW PRAM
AN OPTIMAL PARALLELIZATION OF THE NORTHWEST-CORNER RULE FOR THE EREW PRAM
ACKNOWLEDGEMENT
REFERENCES
CHAPTER 21. PARALLEL DECOMPOSITION OF MULTICOMMODITY FLOW PROBLEMS USING COERCION METHODS
ABSTRACT
KEYWORDS
INTRODUCTION
DECOMPOSITION OF MULTICOMMODITY NETWORK FLOWS USING COERCION METHODS
PARALLEL COMPUTING USING COERCION FUNCTIONS
ANALYTIC MODELS OF PARALLEL PERFORMANCE
COMPUTATIONAL RESULTS
CONCLUSIONS
REFERENCES
PART VI: PLANNING AND SCHEDULING
CHAPTER 22. A GRAPH-THEORETIC MODEL FOR THE SCHEDULING PROBLEM AND ITS APPLICATION TO SIMULTANEOUS RESOURCE SCHEDULING
ABSTRACT
KEYWORDS
INTRODUCTION
THE SCHEDULING MODEL
MULTIPROCESSOR SCHEDULING
PARALLEL I/O SCHEDULING
SCHEDULING WITH MUTUAL EXCLUSION CONSTRAINTS
SCHEDULING IN PARALLEL I/O BUS ARCHITECTURES
CONCLUSIONS
ACKNOWLEDGEMENTS
REFERENCES
CHAPTER 23. INTELLIGENT MODELLING, SIMULATION AND SCHEDULING OF DISCRETE PRODUCTION PROCESSES
ABSTRACT
KEYWORDS
INTRODUCTION
JOB SHOP PROBLEMS: DEFINITION AND OVERVIEW
I-MASSA ARCHITECTURE
THE MODEL BUILDER
SIMULATOR
EVALUATOR MANIPULATOR
FUTURE RESEARCH
CONCLUDING REMARKS
ACKNOWLEDGEMENTS
REFERENCES
Chapter 24. OOFP–Object Oriented Flow Planning
ABSTRACT
KEYWORDS
1 PROBLEM
2 MODEL
3 EQUATIONS
4 IMPLEMENTATION
5 EXAMPLE
6 CONCLUSIONS
REFERENCES
CHAPTER 25. ROMAN: AN INTEGRATED APPROACH TO MANPOWER PLANNING AND SCHEDULING
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. THE ROMAN APPROACH
3. THE SPECIFICATION MODULE
4. TOE ALLOCATION MODULE
5. THE SCHEDULING MODULE
6. THE ROSTER DESIGN MODULE
7. CONCLUDING REMARKS
REFERENCES
PART VII: GENETIC ALGORITHMS
CHAPTER 26. apGA: AN ADAPTIVE PARALLEL GENETIC ALGORITHM
ABSTRACT
KEYWORDS
INTRODUCTION
DECEPTIVENESS
GENETIC ALGORITHMS
TEST FUNCTIONS AND RESULTS
SUMMARY AND FUTURE RESEARCH
REFERENCES
CHAPTER 27. GENETIC ALGORITHMS FOR THE TRAVELING SALESMAN PROBLEM WITH TIME WINDOWS
ABSTRACT
KEYWORDS
INTRODUCTION
TRAVELING SALESMAN PROBLEM WITH TIME WINDOWS
EXPERIMENTAL PROCEDURES
EARLIEST CLOSING TIME CROSSOVER OPERATOR
EVALUATION FUNCTION
TEST PROBLEMS
EXPERIMENTAL RESULTS
CONCLUSIONS
REFERENCES
CHAPTER 28. INCREASED FLEXIBILITY IN GENETIC ALGORITHMS: THE USE OF VARIABLE BOLTZMANN SELECTIVE PRESSURE TO CONTROL PROPAGATION
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. THEORY
3. PRACTICE
4. DISCUSSION
5. CONCLUSION
REFERENCES
CHAPTER 29. PARALLEL GENETIC ALGORITHMS IN COMBINATORIAL OPTIMIZATION
ABSTRACT
KEYWORDS
INTRODUCTION
EVOLUTIONARY AND GENETIC ALGORITHMS
THE SEARCH STRATEGY OF THE PGA
THE TRAVELING SALESMAN PROBLEM
PERFORMACE EVALUATION FOR THE TSP
THE GRAPH PARTITIONING PROBLEM
PERFORMACE EVALUATION FOR THE GPP
CONCLUSION
REFERENCES
PART VIII: HEURISTIC SEARCH TECHNIQUES
CHAPTER 30. CONSTRAINT-DIRECTED SEARCH FOR THE ADVANCED REQUEST DIAL-A-RIDE PROBLEM WITH SERVICE QUALITY CONSTRAINTS
ABSTRACT
KEYWORDS
INTRODUCTION
THE DIAL-A-RIDE SYSTEM
THE ALGORITHM
COMPUTATIONAL RESULTS
CONCLUDING REMARKS
REFERENCES
CHAPTER 31. HEURISTIC SOLUTION PROCEDURES FOR THE GRAPH PARTITIONING PROBLEM
ABSTRACT
KEYWORDS
INTRODUCTION
PROBLEM FORMULATION
THE EXTENDED LOCAL SEARCH PROCEDURE
A GENETIC ALGORITHM FOR THE GRAPH PARTITIONING PROBLEM
SUMMARY AND CONCLUSION
REFERENCES
CHAPTER 32. NEW EJECTION CHAIN AND ALTERNATING PATH METHODS FOR TRAVELING SALESMAN PROBLEMS
ABSTRACT
KEYWORDS
1. INTRODUCTION
2. A STEM-AND-CYCLE REFERENCE STRUCTURE
3. CHAINS FOR EJECTING TOUR SUBPATHS
4. A CONNECTION TO ALTERNATING PATHS
5. FUNDAMENTAL STEM-AND-CYCLE RULES
6. ASYMMETRIC TRAVELING SALESMAN PROBLEMS
7. PARALLEL PROCESSING AND DIYIDED-STEM-AND-CYCLE STRUCTURES
CONCLUSION
ACKNOWLEDGEMENT
REFERENCES
PART IX: DATA RETRIEVAL
CHAPTER 33. ENHANCING DATA RETRIEVAL USING ARTIFICIALLY SYNTHESIZED QUERIES
ABSTRACT
KEYWORDS
I. INTRODUCTION
II. FUNDAMENTALS AND NOTATION
III. DISTRIBUTION CHANGING FILTERS
IV. CASCADING DCT FILTERS
V. EXPERIMENTAL RESULTS
VI. CONCLUSIONS
ACKNOWLEDGEMENTS
REFERENCES
AUTHOR INDEX
SUBJECT INDEX

Citation preview

COMPUTER SCIENCE AND

OPERATIONS RESEARCH New Developments in Their Interfaces

Titles of related interest BRADLEY Operational Research '90

ROMERO H a n d b o o k of Critical Issues in G o a l P r o g r a m m i n g

SAGE C o n c i s e E n c y c l o p e d i a of I n f o r m a t i o n P r o c e s s i n g in S y s t e m s and Organizations

Related journals C o m p u t e r s & Industrial E n g i n e e r i n g Computers & Operations Research Mathematical & Computer Modelling Neural Networks

Free specimen

copy available

on

request

COMPUTER SCIENCE AND

OPERATIONS RESEARCH New Developments in Their Interfaces Editors

OSMAN BALCI Department of Computer Science Virginia Polytechnic Institute and State University Blacksburg, Virginia, U.S.A.

RAMESH SHARDA College of Business Administration Oklahoma State University Stillwater, Oklahoma, U.S.A.

STAVROS A. ZENIOS The Wharton School University of Pennsylvania Philadelphia, Pennsylvania, U.S.A.

PERGAMON OXFORD



NEW YORK

PRESS •

SEOUL



TOKYO

U.K.

Pergamon Press Ltd, Headington Hill Hall, Oxford OX3 OBW, England

U.S.A.

Pergamon Press, Inc, 660 White Plains Road, Tarrytown, New York 10591-5153, U.S.A.

KOREA

Pergamon Press Korea, KPO Box 315, Seoul 110-603, Korea

JAPAN

Pergamon Press Japan, Tsunashima Building Annex, 3-20-12 Yushima, Bunkyo-ku, Tokyo 113, Japan Copyright © 1992 Pergamon Press Ltd. All Rights Reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic tape, mechanical, photocopying, recording or otherwise, without permission in writing from the publisher. First edition 1992 Library of Congress Cataloging in Publication Data Computer science and operations research : new developments in their interfaces / editors, Osman Balci, Ramesh Sharda, Stavros A. Zenios. p. cm. Includes indexes. I. Computer science. 2. Operations research. I. Osman, Balci. II. Sharda, Ramesh. III. Zenios, Stavros Andrea. QA76.C5732 1992 003'.3--dc20 92-10675 ISBN 0 08 040806 0 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library.

Printed in Great Britain by B.P.C.C.

Wheatons Ltd, Exeter

CONTENTS Preface

ix

Referees

xi

I. O P T I M I Z A T I O N T E C H N I Q U E S A Principled Approach to Solving Complex Discrete Optimization Problems Bruce MacLeod and Robert Moll

3

Boolean-Combinatorial Bounding of Maximum 2-Satisfiability Jean-Marie Bowjolly, Peter L. Hammer, William R. Pulleyblank, and Bruno Simeone

23

Naval Personnel Assignment: An Application of Linear-Quadratic Penalty Methods Mustafa Q. Pinar and Stavros A. Zenios

43

Preprocessing Schemes and a Solution Method for the Convex Hull Problem in Multidimensional Space Jose' H. Duld, Richard V. Helgason, and Betty L. Hickman

59

II. LINEAR P R O G R A M M I N G INTERIOR POINT ALGORITHMS Adapting the Interior Point Method for the Solution of Linear Programs on High Performance Computers Johannes Andersen, Roni Levkovitz, and Gautam Mitra

73

Implementation of an Interior Point LP Algorithm on a Shared-Memory Vector Multiprocessor Matthew J. Saltzman

87

HI. NETWORKS Alternate Server Disciplines for Mobile-Servers on a Congested Network Stephen K. Park, Stephen Harvey, Rex K. Kincaid, and Keith Miller

105

Collision Dependent Performance Model for a Star Topology Local Area Network Gerrit K. Janssens

117

Greedy Recognition and Coloring Algorithms for Indifference Graphs Peter J. Looges and Stephan Olariu

127

Minimum Graph Vertex Covering With the Random Neural Network Erol Gelenbe and Frederic Batty

139

Multiple Class G-Networks J.-M. Fourneau and Erol

149

Gelenbe

On Implementing an Environment for Investigating Network Reliability John S. Devitt and Charles J. Colbourn

v

159

vi

IV. C O M P U T E R G R A P H I C S IN O P E R A T I O N S R E S E A R C H Animated Sensitivity Analysis Christopher V. Jones

177

EDINET - A Network Editor for Transshipment Problems with Facility Location Wlodzimierz Ogryczak, KrzysztofStudzinski, andKrystian Zorychta

197

Functional Description of a Graph-Based Interface for Network Modeling (GIN) David M. Steiger, Ramesh Sharda, and Brian LeClaire

213

NETPAD: An Interactive Graphics System for Network Modeling and Optimization Nathaniel Dean, Monika Mevenkamp, and Clyde L. Monma

231

V. P A R A L L E L A L G O R I T H M S AND I M P L E M E N T A T I O N S A Concurrent Computing Algorithm for Real-Time Decision Making Wayne J. Davis Computational Experience with Parallel Algorithms for Solving the Quadratic Assignment Problem Panos M. Pardalos, Kowtha A. Murthy, and Yong Li On Reporting the Speedup of Parallel Algorithms: A Survey of Issues and Experts Richard S. Barr and Betty L. Hickman Optimal Parallel Algorithms for Computing a Vertex of the Linear Transportation Polytope Bruce A. Chalmers and Selim G. Akl

247

267 279

295

Parallel Decomposition of Multicommodity Flow Problems Using Coercion Methods Ruijin Qi and Stavros A. Zenios

307

VI. P L A N N I N G A N D S C H E D U L I N G A Graph-Theoretic Model for the Scheduling Problem and its Application to Simultaneous Resource Scheduling Ravi Jain, John S. Werth, J. C. Browne, and Galen Sasaki

321

Intelligent Modelling, Simulation and Scheduling of Discrete Production Processes Jan Paredis and Tanja van Rij

349

OOFP - Object Oriented Flow Planning Wolfgang Mergenthaler, Hartmut Nord, Hans UlfStraubel, and OlafWolter

363 Quecke,

ROMAN: An Integrated Approach to Manpower Planning and Scheduling Chan Meng Khoong and Hoong Chuin Lau

383

vii

Vn.

GENETIC ALGORITHMS

apGA: An Adaptive Parallel Genetic Algorithm Gunar E. Liepins and Shwneet Baluja

399

Genetic Algorithms for the Traveling Salesman Problem with Time Windows Kendall E. Nygard and Cheng-Hong Yang

411

Increased Flexibility in Genetic Algorithms: The Use of Variable Boltzmann Selective Pressure to Control Propagation Michael de la Maza and Bruce Tidor Parallel Genetic Algorithms in Combinatorial Optimization Heinz Miihlenbein

425 441

Vffl. H E U R I S T I C S E A R C H T E C H N I Q U E S Constraint-Directed Search for the Advanced Request Dial-A-Ride Problem with Service Quality Constraints Jean-Yves Potvin and Jean-Marc Rousseau

457

Heuristic Solution Procedures for the Graph Partitioning Problem Erik Rolland and Hasan Pirkul

475

New Ejection Chain and Alternating Path Methods for Traveling Salesman Problems Fred Glover

491

IX. D A T A R E T R I E V A L Enhancing Data Retrieval Using Artificially Synthesized Queries B. John Oommen and David T.H. Ng

513

AUTHOR INDEX

533

SUBJECT INDEX

534

This page intentionally left blank

PREFACE This book is a collection of selected refereed papers from the conference "Computer Science and Operations Research: New Developments in Their Interfaces" held on 8-10 January 1992 in Williamsburg, Virginia, U.S.A. The conference was sponsored by the Operations Research Society of America (ORSA) Computer Science Technical Section. Seventy-four papers were presented at the conference. This book contains the thirty-three papers selected through a rigorous refereeing process. We gratefully acknowledge the contributions of the sixty-four referees listed on page xi. The interface between Operations Research and Computer Science is much broader than we could hope to capture in a single book or define in this preface. Nevertheless, several key themes appear in the papers that follow. These themes do not restrict in any way the interface of these two disciplines. Instead, they reflect some of the recent and most exciting developments. We see several papers that deal with the impact of novel computer architecture on Operations Research. A variety of large-scale optimization problems, such as linear programs, multicommodity network flows, stochastic programming problems, and quadratic assignment problems, are being reexamined. The goal of most papers in this direction is the design of algorithms that exploit the novelty of computer architectures. The results have been very encouraging from the view of basic methodological research. But at the same time we have seen the development of techniques with impacts in diverse application areas such as military personnel planning and logistics, and operations planning and scheduling. Large-scale optimization is also being achieved through the development of alternative computing paradigms, such as genetic algorithms and tabu search. Their application to difficult optimization problems, especially in combinatorics, has produced very promising results. Problems from transportation have always been a fertile area of application of Operations Research tools. The introduction of advanced navigation tools, such as In-vehicle Guidance Systems, created new challenges for researchers in this area. Several papers on the interface of Operations Research-Computer Science-Transportation were presented at the conference. Some of the important developments are reported in this book. The planning of transportation services is indeed an area where developments in Computer Science and Operations Research will have a substantial impact as we approach the 21st century. The quest for effective approaches to the implementation of interior point algorithms continues. Progress since Karmarkar introduced his algorithm has been quite dramatic. The relevant papers in this book will give readers a clear view of the current state-of-the-art in this very active research area. Going beyond the development of mathematical tools (i.e., models, algorithms, and software), there has always been a keen interest in making these tools accessible to end-users. Developments in user interfaces, computer graphics, and high-level languages ease the use of the tools-of-the-trade by non-experts. Papers in this book illustrate the use of computer graphics for the animation of algorithms and the representation of network modeling problems. Modeling systems for planning factory operations and facility location are also discussed.

ix

X

This book contains several papers which demonstrate the other side of Operations Research and Computer Science interface: the continued application of Operations Research methodologies (e.g., graph theory, network optimization, queueing systems analysis, simulation) to a variety of problems motivated from Computer Science applications (e.g., reliability of communication networks, congestion in computer networks, resource scheduling in computer systems, data retrieval). The interface of Operations Research and Computer Science—although elusive to a precise definition—has been a fertile area of both methodological and applied research. The papers of this book, written by experts of their respective fields, convey the current state-of-the-art in this interface across a broad spectrum of research domains.

The Editors Osman Balci Ramesh Sharda Stavros A. Zenios

REFEREES Clyde L. Monma

Marc Abrams G. Anandalingam

Scott A. Moore

Osman Balci

Robert L. Moose, Jr.

Richard S. Ban-

Heinz Muhlenbein

Bruce A. Chalmers

Kendall E. Nygard

John W. Chinneck

Bob O'Keefe

James W. Chrissis

Stephan Olariu

Q. B. Chung

Panos M. Pardalos

Lloyd W. Clarke

Jan Paredis

Charles J. Colbourn

Joseph F. Pekny

Michael de la Maza

Mustafa Q. Pinar

Angela J. Dixon

Hasan Pirkul

Jonathan Eckstein

Yuping Qiu

Michael C. Ferris

Ronald L. Rardin

Edward A. Fox

Calvin J. Ribbens

Erol Gelenbe

Erik Rolland

Bill Hardgrave

David S. Rubin

Catherine M. Harmonosky

Randall P. Sadowski

Albert L. Harris

Ramesh Sharda

Richard H. F. Jackson

Olivia R. Liu Sheng

Christopher V. Jones

HanifD. Sherali

Jeremy S. Kagan

Kang G. Shin

James P. Kelly

Wendell P. Simpson III

Jeffery L. Kennington

David M. Steiger

Steven Orla Kimbrough

Jeffrey D. Tew

Ramayya Krishnan

Timothy Thomasma

Brian LeClaire

Bruce Tidor

Mark L. Lidd

Robert J. Vanderbei

Hani S. Mahmassani

Joel Wein

T. H. Mattheiss

Rick L. Wilson

Richard D. McBride

Stavros A. Zenios

Robert R. Meyer

Hans-Jurgen Zimmermann

xi

This page intentionally left blank

N

6

W b ^

aupriq>».

^at^

This page intentionally left blank

A Principled Approach to Solving Complex Discrete Optimization Problems

Bruce MacLeod f

and Robert Moll \

fDepartment of Computer Science, University of Southern Maine, Portland, M E 04103 J Department of C o m p u t e r and Information Science, University of Massachusetts, Amherst, MA 01003

Abstract

In this work we report on a general and extensible framework, called O P L , for quickly constructing reasonable solutions to a broad class of complex discrete optimization problems. Our approach rests on t h e observation t h a t many such problems can be represented by linking together variants of well-understood primitive optimization problems. We exploit this representation by building libraries of solution m e t h o d s for t h e primitive problems. These library methods are then suitably composed to build solutions for t h e original problem. T h e vehicle routing problem and its generalizations, which involve not only routing but also delivery scheduling, crew scheduling, etc., is a significant and extensively investigated area of operations research. In this paper we report on O P L definitions and solutions for a wide variety of such problems.

Keywords discrete optimization; programming environments; vehicle routing; modeling

3

4

INTRODUCTION AND BACKGROUND In this paper we report on a notation and computing environment for representing and solving complex combinatorial optimization problems. T h e problems we consider arise in such diverse areas as scheduling, vehicle routing, layout, and warehousing. Our notation and solution methodology is called O P L (for Optimization P r o g r a m m i n g Language). In O P L problems are represented by means of a graph structure called an H C G (Hierarchical Containment G r a p h ) . Nodes in a graph represent either primitive objects or organized collections of primitive objects. T h e ways in which primitive objects can be organized reflect the fundamental d a t a structures t h a t appear in m a n y combinatorial optimization problems, e.g. bounded length lists, unordered sets, rings. An arc represents a function t h a t associates objects in the source node with objects in t h e target node. This mapping represents a fundamental (and reasonably well-understood) optimization problem involving objects t h a t appear in the source and target nodes associated with the arc. O P L achieves its power by means of an abstraction mechanism, which allows a user to create a library of solution methods for t h e fundamental optimization problems t h a t are associated with t h e arcs of an HCG representation. Such solution methods are generally approximation algorithms or local search routines. For example, a suitably constrained m a p from primitive objects to ring structures can model the traveling salesman problem; it can be optimized, for example, using library versions of a nearest neighbor heuristic - an approximation routine, and Lin and Kernighan's 2-opt - a local search heuristic. Many library routines have been defined as part of the O P L environment and their use involves simple calls to t h e appropriate library function. T h e language of policy programs is t h e solution component of t h e O P L environment. Given an optimization problem and an associated H C G , a user creates solutions - t h a t is, policy programs - which involve calls to library routines appropriate to t h e primitive optimization problems identified in t h e problem's H C G . O P L allows these library calls to be embedded in traditional programming language constructs (if, while, . . . ) . An important feature of t h e language of policy programs is its extensibility. New fundamental d a t a structures and new primitive optimization problems can be defined, and given a primitive optimization problem, new solution methods can be created and added to t h e solution library. An O P L prototype has been written in Common Lisp. It has been used to solve problems in vehicle routing, warehouse layout, real time scheduling, and check clearing operations in a bank (MacLeod and Moll, 1991;MacLeod 1989a; MacLeod and Moll 1989b; Moll and MacLeod 1988). Other advanced software systems have been proposed for solving problems in Operations Research. Software environments for mathematical programming include A M P L (Fourer, et a/., 1987), GAMS (Bisschop and Meeraus, 1982), and Platform (Palmer, 1984). Software systems based on an intelligent search of t h e solution space have also been developed. These include the technique of Global Search (Smith, 1988) and t h e ALICE system (Lauriere, 1978). T h e problem of developing languages for describing Operations Research problems has also recieved some attention. T h e structured modeling

5 notation, described in (Geoffrion, 1987a; Geoffrion, 1989), supports t h e descriptions of a variety of computational and d a t a processing issues t h a t arise in Operations Research and Management Science. N E T W O R K S is a modeling system which is based on graph grammers (Jones, 1990). It allows users to specify t h e characteristics of a problem instance and a problem class interactively. T h e O P L notation bears some similarity to t h e above two modeling systems. All three approaches use graphs to represent relationships t h a t exist between principal entities t h a t make up a particular O R problem. However, the focus of our work is on problem solving. For a more detailed review of advanced Operations Research software systems see (Geoffrion, 1987b). This paper describes t h e O P L environment and t h e results of applying O P L . In Section 2 we provide an informal description of the O P L notation and solution machinery. In Section 3 we consider t h e style and effectiveness of O P L policy programs by comparing our performance on t h e multiple depot vehicle routing problem with published results. Section 3 considers various extensions and complications t h a t arise in t h e vehicle routing problem. Section 4 presents an overview of other problems t h a t have been addressed in O P L . General observations from this line of investigation are given in t h e concluding section.

DESCRIPTION OF OPL OPL Representation Machinery O P L ' s problem representation machinery is motivated by four goals: • •

Problem representations should be natural; Representations should lead to problem solutions t h a t can be constructed using software libraries;



Natural representations should lead t o high quality solutions; and



T h e representation system should be extensible.

These goals are achieved using hierarchical constraint graphs, or H C G ' s , to structure t h e description of optimization problems. In an H C G , nodes correspond to a set of objects of a particular fundamental t y p e (or a tuple of such types). Such types can be primitive, indivisible objects, or container types, which are types t h a t "hold" other objects. Containers can hold objects in different ways; the range of possibilities reflects t h e class of fundamental d a t a structures t h a t appear in a great m a n y combinatorial optimization problems, e.g. bounded length lists, unordered sets, rings. An arc represents a function t h a t associates objects in t h e source node with objects in t h e target node. We view such a m a p as representing a fundamental

6 (and reasonably well-understood) optimization problem t h a t determines t h e m a p p i n g between t h e d a t a structures identified in the source and target nodes associated with t h e arc. As an example, Figure 1 indicates t h a t objects of t y p e X are t o b e organized in objects of type Y. An icon, in this case EB indicates t h e structural organization of X objects in t h e Y objects.

Figure 1: A Simple HCG Often such a relationship can be thought of as asserting: " m a p each object of t y p e X into a structured container of type Y". Indeed, we will frequently use "object-container" terminology when we refer to the elements of our generalized assignment mappings. If an object is a container, we assume t h a t its internal positions have a unique numerical value. If p belongs to container structure Y, then position(p) gives this location. T h e system's current existing container types are described below. Additional types are easy to add. •

unordered sets, ^ is a structure that is used for partition/clustering problems.



ring (bounded or unbounded), V , describes an organization of objects into a ring d a t a structure. These structures are used for T S P style routing problems.



slotted object, is used for representing discrete schedules, as well as for representing other "slotted" situations t h a t are common in optimization problems, e.g., parking trucks in loading bays, plugging computer boards into backplanes, or assigning goods to particular bins in a row in a warehouse.

t continuous interval, \H\ this structure is used primarily to represent t h e placement of objects (or tasks) in continuous time. An HCG arc from X to Y represents a function from objects of t y p e X to objects of t y p e Y. T h e most important kind of m a p is called an i n s t a l l a t i o n m a p . Such m a p s place objects from X "inside" one of t h e structures in node Y. Suppose, for example, t h a t in some optimization problem we wish to assign packages to trucks. We may represent this assignment with an installation m a p from X to Y, where X consists of primitive objects - t h e packages, and Y consists of unordered sets - t h e contents of each truck. Mapping p € X to one of t h e unordered sets of Y represents t h e placement of p in t h e corresponding truck. We allow t h e target node of an installment m a p to have one further a t t r i b u t e . T h e node may have either a fixed number of objects - trucks, for example or we may allow t h e node to have a growing number of objects, which are constructed by a "generator" function associated with t h e node.

7 A second kind of m a p from node Z to node W , called an a t t r i b u t e m a p , associates objects of t y p e Z with objects of t y p e W in a 1-1 fashion. T h u s if Z represents faculty members at a university as primitive objects, and W is a collection of slotted objects which are to be interpreted as faculty schedules, then an a t t r i b u t e m a p from Z to W in effect associates each faculty member with a unique schedule. Finally we allow t h e following kind of consists of multiple objects of type S, type S. T h e n t h e m a p c: M — • N is position(c(m),N). T h e primary role of 5, is in scheduling problems.

m a p , called a c o l l a p s i n g m a p . Suppose node M and suppose node N consists of a single object of collapsing if, for m € 0 , 0 € M, p o s i t i o n ( m , 0 ) = collapsing m a p s , as we shall see in sections 3 and

Kernel Optimization P r o b l e m s and Abstraction An installation m a p from objects in X to, say, unordered set objects of t y p e Y, is an incomplete object: no objective function has been supplied, and no predicates have been identified t h a t constrain t h e assignment. This situation is remedied by supplying an objective function, a primary constraint, and, when necessary, a collection of secondary or minor constraints. In our trucking example, for instance, a weight limit may constrain t h e assignment of packages to trucks - and weight might therefore be t h e primary constraint of the problem. An a t t r i b u t e such as refrigeration - does a package require refrigeration and is a particular truck refrigerated? - would then function as a minor constraint. A typical objective function for this example might be to minimize t h e number of trucks necessary to carry a particular load. By attributing such objective function and constraint information to an installation m a p , a user is frequently constructing a m a p t h a t represents a familiar and well-understood optimization problem. It is thus convenient to introduce an abstraction facility. T h a t is, given an installation m a p i: M — • N with objective function f and major constraint c, we write (define-map N A M E (Source-type Dest-type)(Source-node Dest-node f c)) to indicate t h a t t h e m a p n a m e d N A M E is an object with t h e indicated typing attributes and t h e indicated parameters. We ignore minor constraints in t h e naming process. Thus, in our trucking example, we might n a m e t h e induced problem generalized-bin-packing. Finally given an optimization problem t h a t has been formulated as an abstraction as described above, we allow for t h e creation of problem solving methods t h a t are appropriate for each type of problem created and which are bound to t h a t n a m e d m a p . T h u s , after creating generalized-bin-packing, such familiar bin-packing approximation algorithms as first-fit, next-fit, and best-fit can be defined and installed in a library of routines that are applicable t o t h e generalized-bin-packing named m a p . We also allow local search algorithms. T h u s Lin and Kernighan's 2-opt (1972) algorithm for local search is bound to t h e T S P named m a p . Below we identify seven named m a p s between objects and container structures. In each case we supply an icon t h a t denotes t h e organization of objects in t h e container.

8 •

Generalized Bin Pack (GBP), ffl, seeks an assignment of objects to container "bins" t h a t minimizes some objective function, such as t h e number of containers needed. Containers have a capacity constraint and objects have a size with respect t o t h e containers.



Capacitated Partition (CP), B3, seeks a feasible partition of objects into disjoint collections of unordered subsets. A typical constraint might be a scalar "weight" function. T h e objective function for this named m a p is frequently just a constant.



Generalized Graph Partition (GGP), ffl, seeks a minimum cost partition of objects. Costs are incurred if two objects are not located in tth h e same container. T h e cost of separating object i from object j is given by t h e ij entry in a cost matrix.



Traveling Salesman (TSP), * / , maps objects to one of several unbounded rings. Tour cost is calculated by adding the "distances" between adjacent objects in t h e tour. A distance m a t r i x provides the distance between pair of objects. A tour of minimum cost is sought.



Discrete Scheduling (DS), DHD, seeks a non-overlapping placement of objects in slots in a slotted container structure. Objects (tasks) to be scheduled occupy a fixed number of consecutive slots, and containers have a fixed number of slots in which to place objects.



Interval Scheduling (IS), O , seeks to minimize t h e m a x i m u m time on any schedule. Each object (task) requires a fixed amount of processing t i m e and schedules may have a m a x i m u m processing time.



Fixed Assignment (FA), implements an assignment t h a t is predetermined. Each object has t h e n a m e of t h e container and t h e position in t h a t container t h a t it must be placed in. There is no optimization or constraint d a t a associated with this relation. Objects can be organized according to any of t h e primitive structures of OPL.

T h e goal, then, of O P L is to formulate an optimization problem using n a m e d installation m a p s . Once an H C G of this kind has been built, then an overall solution can be constructed by suitably composing t h e library methods t h a t are bound t o t h e n a m e d m a p s appearing in t h e H C G .

Policy Programs T h e O P L programming primitive t h a t provides access to library functions of n a m e d installation maps is called an improvement policy. An improvement policy consists of a sequence of approximation a n d / o r local search library routines. Each element of t h e sequence is associated with an arc t h a t is present in t h e H C G . An improvement policy seeks to place or rearrange a collection of objects in such a way t h a t t h e new solution extends or improves t h e old solution. More concretely, an improvement policy consists of:

9 •

a collection of objects, called primary upon.

objects, which t h e improvement policy acts



a dispatch function which, each t i m e it is called, chooses t h e next primary object to be considered by t h e improvement sequence.



a sequence (£i,...,£jfe) of transformations called an improvement sequence. Each tj is an approximation or local search library routine t h a t is associated with one of t h e arcs (and thus, one of t h e named maps) in t h e H C G .



a collection of H C G arcs called bound arcs, which identify those assignments in an existing partial solution t h a t may not be altered by t h e improvement sequence.

An improvement policy uses t h e dispatch function to choose one primary object from t h e collection of primary objects. Transformations in t h e improvement sequence are applied to t h e primary object if t h e primary object is from t h e same class as t h e tail node of t h e transformation arc; otherwise, t h e transformation is applied to t h e appropriate relatives of t h e primary object. Relatives of object A are all objects which have been assigned t o A (either directly or indirectly) or which A has been assigned t o (again, either directly or indirectly). T h e transformation is applied to all relatives of t h e primary object which have t h e same class as t h e tail node of t h e arc. T h e transformation function can assign or modify t h e assignments of an object as long as assignments associated with bound arcs are not altered. A transformation terminates when all allowable placements and rearrangements have been considered. Backtracking occurs when a transformation cannot locate a successful candidate. In this case, t h e O P L machinery backs u p to t h e previous transformation and a t t e m p t s an alternative assignment or rearrangement. A policy program is an optimizing algorithm for a problem instance t h a t has improvement policies as primitives, and includes, in addition, elementary programming constructs (if, do-while, for, . . ) . D a t a structuring functions (sorting, extracting, merging, ...) are also available.

An Introductory Example

We d e m o n s t r a t e t h e workings of O P L by illustrating its application to instances of vehicle routing, a classical problem in Operations Research. In t h e simplest version of t h e Vehicle Routing Problem ( V R P ) , vehicles deliver packages to a collection of geographically dispersed customers. Each vehicle has a m a x i m u m weight carrying capacity. We assume t h a t vehicles are identical and t h a t vehicle weight capacity is limited in t h e sense t h a t no vehicle can carry a significant fraction of t h e packages. T h e goal of a V R P instance is to find a solution t h a t respects all constraints and at t h e same t i m e minimizes t h e total travel time of all t h e vehicles.

10

Figure 2: HCG for the V R P Problem T h e H C G in Figure 2 represents t h e relationships t h a t exist between t h e packages, t h e routes, and t h e vehicles t h a t make up the V R P problem. T h e named installment m a p CPp- is a capacitated partition m a p ; it partitions packages among vehicles. T h e n a m e d v installment m a p TSP VT represents t h e delivery of packages and t h e relative position of packages in t h e ring structures t h a t determine t h e tour paths of t h e vehicle routes. Finally, CP rv is a capacitated partition m a p . It assigns routes to trucks. CPr-v is, in a sense, a trivial m a p once C P _ „ and TSP P pr have been established, since a route can only be assigned to t h e vehicle t h a t already holds those packages. This principle, which we call the transitivity of containment, plays a significant conceptual and computational role in our work. It says t h a t composed maps t h a t follow alternative p a t h s between two nodes in an HCG must lead to consistent assignments. (In section 4 we will see an example of a situation in which this transitivity assumption is violated.) Informally we solve V R P by 1) crudely clustering packages (i.e. delivery sites) and assigning each cluster to a unique vehicle; 2) routing each vehicle; and 3) optimizing each route. We realize these steps as follows. First associate an angle with each package delivery site, using t h e dispatch depot of t h e vehicles as origin. T h e n sort t h e packages by angle and assign t h e m to vehicles - a partition relationship- on a "next-fit" basis. T h a t is, a t t e m p t to place a package in the current vehicle. If it won't fit, place it instead in t h e next unoccupied vehicle. This procedure crudely clusters packages with nearby destinations into t h e same vehicle. Next route each vehicle: examine each package in a vehicle, in turn, and insert it in t h a t vehicle's route in a position closest t o a package t h a t has already been placed in t h e route. Finally, optimize routes by applying 2-opt local search to each route. Our algorithm does a reasonable job of solving this simple version of V R P . Below we describe how to build a policy program t h a t realizes this algorithm. We construct this policy program in four steps. •

S t e p One: Step one establishes t h e packages to vehicles assignment. First sort by angle, as described above. Next apply our first improvement policy, which we call IP\. Its

11 improvement sequence consists of a single library approximation routine, which solves t h e assignment problem embodied in CPpv using t h e classical "next-fit" algorithm. In abbreviated form we write this policy as: I Pi (packages,next-fit(CP p,)) t Thus, I Pi is a bona-fide policy consisting of a transformational sequence with one entry (the call t o t h e next-fit library routine). T h e collection of primary objects to which I Pi is applied consists of all packages. T h e set of bound arcs associated with I Pi is empty. T h e dispatch function of I Pi is empty, so by default, t h e list order of primary objects applies. •

Step Two: Improvement policy IP2 routes t h e packages associated with each vehicle. T h e algorithm considers each package in a vehicle and places t h a t package in t h e best possible position in t h e r o u t e - t h e position which incurs t h e least additional travel distance. T h e improvement sequence is again one library approximation procedure, which we abbreviate as follows: nearest-neighbours'ip r) T h e underlying O P L machinery will maintain consistency automatically in t h e following sense: packages in different vehicles cannot be assigned t o t h e same route. IP2 will b e called as many times as there are vehicles. Each time it is called, t h e packages belonging t o t h e vehicle under consideration are designated as t h e primary objects. This process is captured in t h e F O R loop of t h e policy program presented below.



Step Three: IPs's singleton improvement sequence improves upon t h e existing partial configuration by performing local search on each of t h e routes using 2-opt t h e library routine local search. T h e packages t o vehicles and packages t o routes arcs are designated as bound. We write

7P 3(packages,2-opt(rSP p))r

to describe IP3. A 2-opt interchange is accepted if t h e total travel distance in a vehicles decreases. •

S t e p Four: IP4 completes t h e remaining assignment, CPrv . Since t h e packages in a route have already been assigned to a vehicle, each route is bound t o t h e vehicle t h a t its packages are assigned to. In addition, it has been assumed t h a t no travel time constraints are associated with this relation. Thus, t h e required assignment has been m a d e implicitly, and we make t h e assignment explicit using t h e following policy: J P 4( r o u t e s , n r s t - n t ( C P r , t) ) For each route, t h e underlying O P L machinery allows only t h e single feasible vehicle to b e considered, and so each route is trivially assigned to t h e proper vehicle. T h e preceding steps yield t h e following program, packages = sort(packages, polar-coordinates)

12 I Pi (packages,next-fit (CP )) pv F O R each vehicle in vehicles begin packages = packages in vehicle IP (packages, n e a r e s t - n e i g h b o r ( T 5 P ) ) 2 pr 7P (packages,2-opt(TSP )) 3 pr end J P 4( r o u t e s , n r s t - f i t ( C P r )u) T h e program can now be applied to the problem instance. T h e policy program could be augmented further to obtain additional improvement. For example, after t h e F O R loop has finished, packages could be moved or interchanged between routes (and vehicles) by local search library routines move-1 and swap-2 and then 2-opt local search could be applied again to individual routes. This strategy involves two local search procedures operating together in a single improvement policy. T h e ability to compose multiple local search and approximation routines in a single improvement policy adds considerable power to O P L , as we shall demonstrate in Section 4, where we present an extended O P L analysis of a generalized V R P problem. This simple example illustrates the style in which O P L problems are representation and solved. In t h e next sections two rather different problems are considered. Our purpose is to show O P L ' s flexibility as a representation system as well as its effectiveness as a problem-solving idiom.

A B E N C H M A R K OF OPL ALGORITHMS Our initial O P L benchmarks have involved t h e Depot Vehicle Routing Problem, or D V R P , which is a generalization of t h e simple V R P problem outlined in t h e introductory section. D V R P is like V R P , except t h a t vehicles and packages are distributed a m o n g several depots, and deliveries are done from these sites. In this section we demonstrate t h a t O P L is capable of producing high quality solutions when compared with previously studied approaches to D V R P . Not surprisingly, many algorithms developed using O P L are similar to previously developed algorithms. We consider this a strength of t h e notation, since O P L algorithms can be developed quickly and correctly through t h e extensive use of library routines. A number of researchers have proposed algorithms for solving t h e D V R P t h a t conform to t h e following outline. First, an assignment of customers (that is, package delivery sites) to depots is developed. Then t h e independent V R P problems at each depot are solved. Finally, in some cases, local search heuristics are applied to improve t h e solution. T h e algorithm presented in (Gillet and Johnson, 1976) assigns customers to depots according to a heuristic which considers the distance to depots as well as t h e distance to nearest customer. T h e independent V R P problems are solved using a sweep algorithm (Clarke and Wright, 1964), and vehicle tours are developed using t h e local search al-

13 gorithms presented in (Lin, 1965; Lin and Kernighan). Tilman and Cain (Tillman and Cain, 1972) develop an algorithm based on t h e savings m e t h o d . Single customer tours are constructed initially with each tour bound t o its t h e closest depot. T h e algorithm then merges tours if there is a subsequent decrease (a "savings") in r o u t e length. T h e assignment of customers t o depots can change in t h e route merging procedure. Golden et al. (1977) combine some of t h e characteristics of both of t h e above algorithms to produce a method which is applicable t o large D V R P problems. Wren and Holliday (1972) and Salhi and R a n d (1987) develop algorithms which repeatedly apply a collection of improvement procedures to a number of different starting solutions. More recent work on t h e D V R P has involved t h e simultaneous solution of b o t h t h e multiple depot location problem and t h e multiple depot routing problem (Perl, 1987). T h e algorithm for solving t h e D V R P portion of t h e problem works as follows: t h e savings m e t h o d is applied to t h e problem to develop a collection of routes, after which two local search routines are applied.

^Package*

^

^Depot-VUit

^ _

Figure 3: H C G for t h e D V R P Problem An H C G for D V R P is given in Figure 3. T h e intuitive embedding of V R P in D V R P is reflected in t h e H C G in a natural way. Indeed, if CP ~d is satisfied first, then what p remains is a collection of separate V R P instances, and this is certainly a plausible initial solution strategy. Of course after these independent V R P instances have been solved, it may make sense to introduce further crosstalk between depots (using a "2-swap" local search routine) and then optimize any altered routes a second time with, say, 2-opt. As reported in (Moll and MacLeod, 1988) and (MacLeod and Moll, 1989), we created about a dozen policy programs for D V R P and applied t h e m to t h e D V R P d a t a sets cited in (Perl and Daskin, 1985; Perl 1987). T h e best of these policy programs outperformed the results reported in those papers on all three reported d a t a sets. Let us examine t h e workings of this "best" policy program informally. T h e policy program is built in two phases, and proceeds as follows: first a feasible solution is built: packages (i.e. delivery sites for packages) are first assigned t o their nearest depots, subject to depot capacity constraints. Next V R P routing and optimization is

14 applied to each separate depot, along t h e lines described above. This concludes phase I of t h e algorithm. Phase II is a pure local search stage. Starting with t h e final solution of phase I, packages are moved one at a time to more promising routes; packages in different routes exchange positions; packages in different routes exchange routes and are placed in t h e "best" position in t h e other package's original route; and 2-opt is applied. In general these exchanges take place across routes associated with different depots. In summary, then, our D V R P solution for the datasets described in (Perl and Daskin, 1985; Perl, 1987) works by 1) Factoring t h e D V R P instance into separate V R P problems by partioning packages among depots; 2) solving each V R P instance separately; and 3) incrementally mixing these solutions repeatedly, each time consolidating results by reoptimizing each subproblem. We stress t h a t this D V R P solution is built easily by appropriately composing O P L library functions. As a further test of O P L problem solving capabilities, we considered two of t h e problems (problems 6,7) described and solved in (Gillet and Johnson, 1974). Our problem solving machinery is compared to this particular work because it was one of t h e few sophisticated, "well tuned" methods for D V R P t h a t also included datasets. Our first a t t e m p t at solving these two problems involved t h e application of t h e "best" policy program developed for the (Perl and Daskin, 1985; Perl, 1987) datasets. T h e results were approximately 4% above the reported results in (Gillet and Johnson, 1974). At this point, we had a choice. While the sophisticated heuristics described in (Gillet and Johnson, 1974) could be developed and incorporated as a library routine in t h e O P L environment, we instead tried a different approach: we used randomization to construct initial solutions to t h e (Gillet and Johnson, 1974) datasets. In t h e context of O P L , randomized assignments can be constructed for any collection of arcs in an H C G . In the D V R P packages could be randomly distributed to depots, vehicles, or positions in routes. Vehicles could also be randomly distributed to depots. Our initial results indicate t h a t for these two problems, packages should be assigned to t h e closest depot and then randomization of package to route and vehicle assignments is most effective. Therefore our second solution involves distribution of packages to closest depot and then random assignments of packages to vehicles and positions in routes. Once a random solution is constructed, t h e sequence of improvement routines described in step 3 of t h e "best" policy program is applied to t h e starting solution. T h e best among all random starting solutions was retained. Using this approach we developed a solution for dataset 6 in (Gillet and Johnson, 1974) t h a t is 1% worse t h a n t h e reported results. For dataset 7 in (Gillet and Johnson, 1974) we constructed a solution t h a t is 0.5% b e t t e r t h a n t h e reported results.

E X T E N D I N G A N D MERGING PROBLEMS To demonstrate t h e full power of t h e O P L environment and illustrate its potential for

15 use in real-world situations, we solve a problem of considerable complexity. This problem a t t e m p t s to determine an optimal (least cost) mix of leased vehicles t o deliver packages in a setting in which there are multiple depots, packages are picked u p and dropped off, and each pickup/dropoff transaction must take place within a predefined t i m e window. T h u s , the objective is to choose a fleet which is feasible with respect to t h e routing, capacity, and scheduling constraints and for which t h e routing and fleet costs are minimized. One of the strengths of t h e O P L environment is t h e ease with which solution machinery can be reused when a smaller problem is embedded in a larger problem. We exploited this feature in our D V R P solution when we reused t h e optimization methods developed for V R P after we converted D V R P to a collection of V R P instances by partitioning packages among depots. In this section we present a more complicated illustration of this style of embedded problem solving. We make use of solutions to t h e Combined Routing and Scheduling Problem (CRAS), t h e Fleet Mix and Size problem ( F S M V R P ) , and D V R P to construct a solution t o t h e complex problem cited above. We first describe t h e H C G notation and policy program solutions for the three problems. Then we develop t h e H C G and a sample policy program for t h e complete problem.

F l e e t M i x a n d Size V R P T h e fleet mix and size variant of V R P can be formulated as follows: given a set of packages which are to be delivered to sites, lease a fleet of delivery vehicles at m i n i m u m cost to make t h e prescribed deliveries. T h e H C G representation of this problem has t h e same form as for V R P . However t h e objective function is different: we now wish t o minimize t h e combined cost of routing and fleet leasing. There are a number of approximation algorithms for F S M V R P (Gheysens et al, 1984). One approach adapts t h e so-called savings algorithm (Clarke and Wright, 1964). This algorithm starts by assigning each package to a separate vehicle. T h e n , at each step, t h e two routes with t h e highest savings value are merged if there is a vehicle with sufficient capacity t o deliver t h e packages. In Golden et al. (1982), F S M V R P savings includes vehicle leasing costs, routing costs, and opportunity costs. At every step, opportunity savings costs are meant t o reflect t h e potential value of unused vehicle capacity after combining two routes. A number of different estimates of opportunity savings were evaluated in Golden et al. (1982) and t h e Realistic Opportunity Savings (ROS) estimate of opportunity costs was among t h e most effective. A O P L algorithm similar in spirit t o the savings algorithm is summarized below. First, each package is assigned to a separate vehicle. T h e vehicle chosen is t h e one with the minimal sufficient capacity available to hold t h e package. A route is also created for each package and each vehicle. T h e n routes are merged using a merge-move routine applied to routes. This process combines routes, merging deliveries into t h e same vehicle, based on t h e ROS metric. It continues until mergers can generate no further savings. T h e resulting vehicle routes give a reasonable first solution to F S M V R P . T h u s this algorithm

16 demonstrates t h a t O P L is able to generate and select a mix of containers in order to optimize a set of criteria.

A Combined Routing and Scheduling Problem Combined Routing and Scheduling Problems (CRAS) address situations in which deliveries may have precedence constraints as well as time-window constraints. T h e particular problem we consider involves both: packages must be picked up and then delivered, and this must happen within designated time windows. Pickup and delivery problems show up in m a n y application areas such as routing and scheduling of messenger services, and tractor trailer routing with partial loads. T h e HCG we use to characterize this class of problems is given Figure 4.

Figure 4: HCG for t h e CRAS Problem We now clarify t h e parts of this HCG which do not appear in previous examples. Package pickups and deliveries are designated by the TASKS node. T h e d a t a associated with tasks include t h e size of a package, whether it is a pickup or a delivery, and any time-window constraints. T h e node OD-SETS collects together pairs of pickups and deliveries. T h e notion of OD-SETS comes from t h e work of (Bodin et al, 1983) and its role in t h e H C G is to facilitate t h e necessary grouping of a package pickup and delivery in t h e same route and vehicle. T h e SCHEDULES node designates t h e sequence and time frame in which tasks are to be performed. ISts is a scheduling relation which models the assignment of a task (either package pickup or package delivery) to a position and time in t h e schedule. CP t v is designated as a capacitated partition relation and the TSP models t h e route taken by a vehicle tr to perform pickups and and deliveries. A constraint is also associated with t h e TSP tr relation to ensure t h a t t h e pickup task is always performed before t h e delivery. T h e FAt-od, CPod-v, and CPod-r relations are included to facilitate t h e grouping of pairs

17 of pickups and deliveries in t h e same route and t h e same vehicle. Finally, each of t h e CP -. , and J 5 * _ , has a constraint which restricts task placement t o relations TSP -n t tv t h e same relative position in as schedule, route, and vehicle. One of t h e few algorithms for solving this problem is described in (Bodin et al, 1983). This algorithm can b e adapted to our problem variant which includes time windows for pickups and dropoffs as follows: first pickups and dropoffs which are t o be serviced in t h e same vehicle run are grouped together. Routes are then generated which minimize t h e total distance. Finally, a scheduling procedure is applied to determine t h e t i m e for each delivery and pickup. This style of problem solving is consistent with the O P L machinery, and we have implemented a simple variant of t h e above algorithm. First pickups and dropoffs are grouped together to form OD-Sets. In addition each vehicle is assigned an e m p t y route. T h e n a single improvement policy assigns each OD-set and its component tasks to t h a t route which adds t h e least travel time to t h e overall solution constructed so far; furthermore t h e policy also checks t h a t a feasible schedule can be constructed and t h a t vehicle capacity constraints have not been violated. If vehicle capacity constraints are violated or if a feasible schedule cannot be constructed, O P L must backtrack until a satisfactory route can be found.

Combining Problem Instances We now consider how O P L can combine t h e just described subproblem solutions to yield a solution to t h e larger problem. Specifically, we consider a fleet mix and size problem in which there are multiple depots, and for which packages have precedence and t i m e window constraints. T h e objective is to choose a fleet which is feasible with respect to t h e routing, capacity, and scheduling constraints, and such t h a t combined routing and fleet leasing costs are minimized. T h e H C G for this problem is given in Figure 5. Initial experiments on this problem are based on the policy programs for t h e previously described problems. Our approach is as follows: First, tasks are assigned to depots. As in D V R P , this first policy partitions t h e full problem into a collection of more tractable subproblems. T h e n OD-sets are formed and t h e modified savings algorithm introduced for t h e F S M V R P problem is applied to reduce route length while checking schedule feasibility and vehicle capacity constraints. (Our modified savings algorithm allows tasks to be inserted in any position in t h e new route after routes are merged.) This creates near-minimal cost routes and fleet profiles for each subproblem. In addition, some cross-talk optimization is introduced in an a t t e m p t to achieve an final improvement. Notice however t h a t because of t h e presence of both precedence and scheduling constraints, cross-talk optimization becomes a much chancier business, and it therefore pays to work harder to achieve a promising initial partition among depots.

18

('Schedules^

IS,-.

^Routes

WD At-od

.(Tasks

) TSP

TSPt-r

dx

{jepot-Visij)

Figure 5: H C G for t h e Combined Problem Preliminary experiments with t h e above policy program have given reasonable results and t h e algorithm seems well behaved. Ongoing work is focused on t h e problem of refining exchanges between depots in t h e presence of t h e above-mentioned constraints.

OTHER OPL APPLICATIONS T h e primary focus of this paper has been on t h e vehicle routing applications area. This focus has helped demonstrate t h e ways in which t h e O P L environment is extensible, b u t has not done justice t o t h e range of applications which can b e addressed in t h e O P L environment. This section provides a overview of a few other O P L applications. We have looked at t h e problem of representing t h e course scheduling problem in (MacLeod and Moll, 1991). T h e course scheduling problem arises when an academic institution offers courses t o students who have some flexibility in course selection and sequencing. T h e academic institution a t t e m p t s t o find a schedule of courses t h a t minimizes t h e number of student conflicts and is feasible with respect t o classroom usage and faculty schedules (deWerra, 1985). We assume t h a t t h e assignment of faculty t o courses has already been completed and we show how t o handle such issues as preassignments, infeasible course assignments and multiple sections of a course. Another problem t h a t we considered involves t h e allocation and scheduling of tasks a n d resources in a distributed computing network. T h e problem is formulated as follows. We assume t h a t a computer system is equipped with a set of identical processors, as well

19 as a collection of non-identical resources. Tasks are to be executed by t h e processors. Each task requires a certain amount of processor time from one of t h e processors, and, in addition, each may require different resources to be available during task execution. T h e types of resources include files, d a t a structures, and physical components of a distributed network. As with tasks, each resource is assigned to a unique processor. A task may access a resource in shared or exclusive mode. If task A requires resource R in exclusive access, then no other task can use this resource while A is executing. Shared mode access to a resource means t h a t different tasks can use t h a t resource at t h e same time. Each task has an arrival time, a deadline, and a required computation time. Tasks must be scheduled for processing between their arrival and deadline times. This problem has one additional feature that can affect t h e quality of solutions dramatically. If task A requires resource R, and A and R are assigned to different processors, then an execution time tax is charged to A to reflect the cost of network activity incurred when A accesses R. We assume a "higher" execution tax r a t e in t h e case where A accesses R in exclusive mode. T h e results in (MacLeod and Moll, 1991) show how t h e relative performance of three algorithms changes under different network tax structures.

CONCLUSIONS T h e O P L problem solving notation and associated programming environment has been designed to allow for the natural representation and solution to a broad class of discrete optimization problems. Problems are represented as graphs, in which nodes stand for t h e objects of t h e problem, and arcs represent recognizable primitive optimization problems. O P L allows solutions to be built quickly with the aid of extensible software libraries, which provide off t h e shelf solution machinery for the atomic "pieces" of a problem's O P L representation. O P L is extensible. Primitive problems can be added to t h e existing list of named maps and solution methods for new and existing named maps can be added to t h e O P L libraries. T h e constructive m a n n e r in which HCG's and policy programs are built makes problem integration a fundamental part of the O P L environment. Indeed, problem integration has been applied at some level to each of the problems described in this paper. For example, t h e multiple depot routing problem was an easy extension - both notationally and computationally - of t h e single depot routing problem. T h e ability to piece together t h e description and solution of related subproblems is an invaluable technique for addressing complex optimization problems. We expect O P L to evolve as experience with one class of problems adds new insights for problem representation and new entries in the environment's software libraries. Indeed, this evolutionary process allowed much of t h e work described in (MacLeod and Moll, 1991) to be built from t h e "pieces" of the multiple depot vehicle routing problem, and we expect further leaps of this kind in t h e future.

20

BIBLIOGRAPHY Bisschop, J. and A. Meeraus (1982), "On t h e Development of a General Algebraic Modeling System in a Strategic Planning Environment", Math. Programming Studies Vol 20, North-Holland, Amsterdam. Bodin, L.,Goldin, B . , Assad, A., Ball, M. (1983), "Routing and Scheduling of Vehicles and Crews", Computers and Operations Research, Vol. 10, No. 2. Clarke, G., Wright, J. (1964), "Scheduling of Vehicles from a Central Depot t o a Number of Delivery Points", OR, Vol. 12. de Werra, D. (1985), "An Introduction to Timetabling", European Journal of Research, Vol 19, No 2, 1985.

Operations

Fourer, R, D.M. Gay, and B.W. Kernighan (1987), " A M P L : A Mathematical Programming Language," Computing Science Technical Report No. 133, A T & T Bell Laboratories, Murray Hill, N J 07974 Geoffrion, A. (1987a), "An Introduction to Structured Modeling", Management 33:5.

Science

Geoffrion, A. (1987b), "Modeling Approaches and Systems Related t o Structured Modeling", Working Paper 339, Western Management Science Institute, University of California, Los Angeles. Geoffrion, A. (1988), "SML : A Model Definition Language for Structured Modeling" Working Paper 360, Western Management Science Institute, University of California, Los Angeles. Geoffrion, A. (1989), "The Formal Aspects of Structed Modeling", Operations Vol 37, No. 1.

Research,

Gheysens, F . , Golden, B., Assad, A. (1984), "A Comparison of Techniques for Solving t h e Fleet Size and Mix Vehicle Routing Problem", OR Spektrum, Vol. 6. Gillet, B . and J. Johnson (1974), "Sweep Algorithm for t h e Multiple Terminal Vehicle Dispatch Algorithm", 46th ORSA meeting San J u a n , P u e r t o Rico. Gillet, B. and J. Johnson (1976), "Multi-terminal Vehicle Dispatch Algorithm" Omega Vol. 4. Golden, B., T . Magnati, and H. Nguyen (1977), "Implementing Vehicle Routing Algorithms", Networks 7. Golden, B., Assad, A., Levy, L., Gheysens, F . (1982), "The Fleet Size and Mix Vehicle Routing Problem", College of Business and Management Technical Report 82-020, University of Maryland. Jones, Christopher (1990), "An Introduction to Graph-Based Modeling Systems", ORSA

21 Journal on Computing, Vol. 2.

1

Kernighan and Lin (1970), "An Effective Heuristic Procedure for Partitioning Graphs' BSTJ No. 2 Lauriere, J.L. (1978), "Alice: A Language for Intelligent Combinatorial Exploration" A.I. Journal. Lin, S (1965), "Computer Solutions of the T S P " , BSTJ, No. 10. Lin S., and Kernighan, B.W (1973), "An Effective Hueristic Procedure for t h e Traveling Salesman Problem" Operations Research 21. MacLeod, B . and Moll, R (1991). "OPL : An Environment for Solving Discrete Optimization Problems", COINS Technical Report 91-87, University of Massachusetts, Amherst, MA 01003 MacLeod, B (1989), " O P L : A Notation and Solution Methodology for Hierarchically Structured Optimization Problems", P h . D . Thesis, University of Massachusetts, Amherst, MA 01003. MacLeod, B . and Moll, R (1990), "A Toolkit for Vehicle Routing", Proceedings of t h e I E E E Conference on Systems Integration, Morristown, N . J . Moll, R.N. and MacLeod, B (1988), "Optimization Problems in a Hierarchical Setting", COINS Technical Report 88-87, University of Massachusetts, Amherst, MA 01003 Palmer, K. (1984), "A Model Management Framework for Mathematical P r o g r a m m i n g " , Wiley, New York. Perl, J. and M. Daskin (1985), "A Warehouse Location Routing Problem", Transportation Research-B Vol 19B, No. 5. Perl, J. (1987), "The Multi-Depot Routing Allocation Problem" American Mathematical and Management Sciences Vol 7.

Journal

Salhi, S. and G. Rand (1987), "Improvements to Vehicle Routing Heuristics", of the Operational Research Society Vol 38 No. 3. Smith, D. (1988), "Structure and Design of Global Search Algorithms", Kestrel Technical Report, Palo Alto, California 94304.

of

Journal

Institute

Tillman, F . and T . Cain (1972), "An Upper Bounding Algorithm for t h e Single and Multiple Terminal Delivery Problem", Management Science Vol 18, No 11. Wren, A. and A. Holliday (1972), "Computer Scheduling of Vehicles from One or More Depots to a Number of Delivery Points", Operational Research Quarterly Vol 23, No. 3.

This page intentionally left blank

BOOLEAN-COMBINATORIAL B O U N D I N G OF M A X I M U M 2-SATISFIABILITY Jean-Marie Bourjolly Dept. of Decision Sciences and Management Information Concordia University, Montreal, Canada

Systems

1

RUTCOR,

P e t e r L. H a m m e r Rutgers University, New Brunswick, New Jersey, USA

William R. PuUeyblank IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA Dept. of Statistics,

Bruno Simeone "La Sapienza" University, Rome, Italy

ABSTRACT We consider t h e weighted m a x i m u m 2-satisfiability problem in its equivalent form of finding t h e minimum z* of a quadratic posiform. We describe a polynomial-time algor i t h m for computing a lower bound on z*. T h e algorithm consists of a finite sequence of elementary boolean operations of two types: fusions (x + x = 1) and exchanges (x + xy = y + yx). Our main result is t h a t the bound obtained by this m e t h o d is equivalent to the roof duality bound, which is known to be computable by linear programming. Furthermore, one can check in polynomial time whether such bound coincides with z*. If not, one can obtain strictly sharper lower bounds making use of two further elementary boolean operations, namely condensations and consensus. KEYWORDS Maximum 2-satisfiability; quadratic unconstrained optimization; pseudo-boolean functions; matching theory; roof-duality.

1. Maximum 2-satisfiability problems and their reducibility T h e weighted maximum 2-satisfiability problem (MAX 2-SAT for short) is usually stated as follows: given a quadratic formula in conjunctive normal form ( C N F ) , let a positive weight be associated with each clause. One wants to find a t r u t h assignment maximizing the total weight of t h e clauses t h a t are satisfied. ^ h e author gratefully acknowledges the partial support of NSF (Grant DMS 89-06870) and AFOSR (Grants 89-0512 and 90-0008). 23

24 While t h e 2-satisfiability problem is known t o be solvable in polynomial time, M A X 2SAT is NP-complete even in t h e unweighted case (Garey, Johnson and Stockmeyer, 1976) and even under t h e further restriction t h a t t h e quadratic formula is Horn ( J a u m a r d a n d Simeone, 1987). As a m a t t e r of fact, many hard combinatorial optimization problems, such as unconstrained quadratic 0-1 optimization, max-cut, vertex packing a n d others, can b e immediately reduced t o M A X 2-SAT (see H a m m e r a n d Simeone, 1989). T h u s it makes sense t o develop fast heuristics for this problem. Computational results for several such heuristics are reported in Hansen and J a u m a r d , 1987. In this paper, we shall describe a class of dual heuristics which yield an upper b o u n d of the o p t i m u m of a M A X 2-SAT problem via a sequence of simple boolean operations. T h e M A X 2-SAT problem has an equivalent formulation in disjunctive normal form ( D N F ) . Here t h e quadratic boolean formula is given in D N F , a n d one wants t o minimize the total weight of t h e terms that are not satisfied. More formally, let Xi, x , • • • , xbe 2 n binary variables (taking t h e values 0 and 1), and let x"i, x , • • • , xbe their complements 2 n (i.e. X{ = 1 — X{). A literal is either a variable or its complement. A term is a product of literals corresponding to different variables. A posiform is a polynomial (x,x) in t h e 2n literals x , x , • • • ,;xx"i, x , • • • , xwith positive coefficients, which for all practical a 2 n 2 n purposes can be assumed to be rational numbers; that is, where c is a (rational) constant a n d V> is a Q N P , such t h a t

n

(x,x),

for all x € B

(1.2)

Notice t h a t a Q N P admits many different translations. For example, (xx, x , z )

2 3 = 2x\x2 +

— — 1 -I- 1x{x -I=

2

-I- xi 4- xz X 1 X 3 + xi + 2 x + x

X1X2 + X1X2 + X1X3 +

= 1 + 2XiX + X!X3

2

X

2+

X3

2

3

25 D e f i n i t i o n : A Q N P is reducible if it admits a translation ( c , ^ ) where t h e constant c is strictly positive; else is irreducible. Clearly, if is reducible t h e n minimizing i/> is equivalent t o minimizing . Furthermore, c is a lower bound on z*. This observation naturally leads t o t h e following question: What is the largest constant c* such that = c* + *, where * is a QNFl Let us denote such constant by h(). P r o p o s i t i o n 1.1 (i) * ( * ) < * * (ii) h() = z*, iff t h e quadratic boolean equation = 0 has a solution. Proof: Obvious. In this paper, we shall describe a simple polynomial algorithm which, for any given Q N P o u t p u t s a translation (c*, *) of ^ such t h a t c* = h(). From (i) it follows t h a t c* is a lower bound on the optimal value z* of (1.1). Furthermore, it follows from (ii) t h a t one can check whether c* indeed coincides with z* in polynomial time, by solving a 2-SAT problem. We notice also t h a t t h e Q N P * output by t h e algorithm is irreducible. T h u s the algorithm can b e viewed as a preprocessing procedure which replaces problem (1.1) by a problem of t h e same form in which t h e posiform t o b e minimized is irreducible.

2. Connections with roof-duality T h e problem we are interested in, i.e. finding, for any given Q N P t h e largest constant h() = c* such t h a t = c* + * for a Q N P *, turns o u t t o b e closely related t o t h e roof-dual problem introduced by Hammer, Hansen a n d Simeone (1984). According t o these authors, a roof of a quadratic pseudo-Boolean function

T f(x)

n n

= x Qx

=

£ £ * i a ? i X j

(where, without loss of generality, t h e m a t r i x Q is assumed t o b e upper triangular) is any linear function of t h e form

n

gx(x)=

£

V * + foy - Afc>i +

(tJ)€P

£

X^l

-

(i,i)€N

- ), iX Xj (x G B )

where 0 c'

+ V>'

with d > c and ^ ' a Q N P . Before examining systematic procedures for carrying out these operations, we shall illustrate t h e idea on a small example. Let $ ( x i , X2,x ,

3

x ) = 10xix

4

2 + Sxix2 -f 6 X 1 X 3

+

Ax{xz

+

8X1X4 +

4x2X3

-j-12x x + 8 x x + 20x3X4 + 2xi + 1 4 x + 2 0 x

24

24

3

4

(3.1)

We shall show on this example how a sequence of algebraic manipulations can "squeeze out" of t h e original expression a positive constant, which represents a lower bound of t h e function. These manipulations will be seen later to be the basic steps in t h e algorithms described in this paper.

27 Let us notice first of all t h e identity t + fr = i| +

(3.2)

tf(=*Vi7)

which holds for arbitrary £,n € { 0 , 1 } . By applying this transformation we see t h a t 1 4 x + 2 0 x x 4- 2 0 x = 1 4 x 4- 1 4 x x 4- 6 x x + 2 0 x ;

3

34

4

4

34

34

4

using now the identity (£€{0,1}),

(3.3)

we obtain 1 4 x + 2OX3X4 4- 2 0 x = 14-1- 1 4 x x 4- 6 x x 4- 6 x

3

4

34

34

4

Substituting into (3.1), we get $ = 14 -f 1 0 x i x 4- 8x1X2 4- 6 x x + AxiXz + %x{x -f 4 x x

2

x3

23

4

+ 1 2 x x + 8 x x + 1 4 x x + 6 x z 4- 2 x + 6 x

24

24

34

34

x

(3.4)

4

We have "traded" 14 units of x for 14 units of x . Since 20 units of x were already 3 4 4 available, we could "squeeze out" the positive constant 14. A repeated application of the above argument yields the identity 2xi + 4afix + 6x af + 6 x = 2 + 2 x i x + 2xix

3

3 4

4

3

3

+ 2af x + 4 x x -f 4 x

34

34

4

Here we use 2 units of X\ t o get 2 units of x , which are then used t o produce 2 units of 3 x . Since 6 units of x are at hand, we can squeeze out the constant 2.

4

4

In view of the last identity, we obtain the following expression: $ = 16 + 10x!X + 8x1 x + 8xiX + 2x"ix + 8x1 x + 4 x x + 1 2 x x

2

2

3

3

4-8x af 4- 1 6 x x 4- 4x x~4 -f 4 x

24

34

3

4

23

24 (3.5)

4

Next, we m a k e use of t h e identity 4 x -|- 4x af -f 2x"iX + 1 0 x i £ 4- 8 x x

4

3 4

3

24=

2

2 4- 2 x x 4- 2 x x 4- 2 x i x 4- 8x1 x -f 2x"ix -f- 6 x x -+• 2 x x

34

34

3

2

2

24

24

Here the idea is slightly different: we save 2 units of x for later use, and we invest t h e 4 remaining 2 units of x to obtain 2 units of x , which are used to produce 2 units of 4 3 a?i, which are used to get 2 units of x , which are used to obtain 2 units of x . T h e 2 4 latter ones, together with the 2 units of x t h a t were saved at the beginning, allow us 4 to squeeze out a further constant 2. Thus if we replace into (3.5) the l.h.s. of the last identity by the r.h.s. we get $ = 18 + 8x1 x + 10x^2

2

+ 1 0 x i x + 8x1 x + 4 x x 4- 1 4 x x 4 -

3

4 - 6 x x 4- 1 8 x z 4-

24

34

4

22324

23

24

(3.6)

28 Notice t h a t (3.4), (3.5), and (3.6) yield successive translations of $ in which the constant term becomes larger and larger. On the whole, we were able to squeeze out the constant 18 and thus we can conclude t h a t 18 is a lower bound of the minimum of However, at each step the constant was squeezed out at t h e expense of linear terms. In fact, the latter ones were completely used up in the last expression (3.6). It will follow from the results to be developed in the sequel t h a t the application of the transformations (3.2) and (3.3) alone cannot lead to any further increase of the binding constant 18. In order to improve this bound, we shall use two new relations, tV + &l = t

(*,i»€ {0,1})

tV + v(>U

(3.7)

€{0,1}).

(3.8)

This follows from the identity

fr + # = tt + 0l< +

flFC

(3.9)

(see Simeone, 1979). Notice t h a t (3.2), (3.3), and (3.7) are b u t special cases of (3.9) Then we successively get 4 x i x 3 + Ax{x2 > 4x3x~2 4 x 3x 2 + 4 x 2z 4 > 4x3X4 4 z 3x 4 + 4 x 3z 4 = 4 x 4 and using again the previous techniques, 2x4 + 2 x 2« 4 -f 2xix2 -f 2x{x4 = 2x4 + 2 x 2x 4 + 2xxx2 +

2xxx4

Adding u p all l.h.s.'s and r.h.s.'s and simplifying, we obtain the inequality 2xix2 + 2xix2 + 4xix3 -f 2 x i x 4 + 2x 2a; 4+ 2x2x4 4- 4 x 3z 4 > 2 +

2xxx4

In view of this inequality we finally obtain from (3.6) $ > 20 + 6xix2 + Sx{x2 + 6xxx3 +

2xxx4

+6x*i x4 4- Ax2xz + I2x2x4 + 4x2x4 + 2x3x4 +

lAxsx4

4

We conclude therefore t h a t 20 is a lower bound for the minimum of $ in B . To summarize, starting from an initial Q N P , we perform a (finite) sequence of the following four elementary boolean operations: (1) Fusion 1 2. Let us assume t h a t there is a squeezing sequence (o,i, • • • , 0) of length s for 0. Then Case 1.2: £o belongs to the stem S.

is a distance is denned as t h e number is a of edges Case 1.2.1: £ 0w a£ odd distance from d (this in 5 ( 6 , C o ) )• Then /?' = Ro»6]S(&>?o) blossom, and [fo,£o][£o,£o]/?' monocycle of £o][£o>£o]/^' * monocycle of fa.

a

Case 2: The linear term d~and also £t if the latter is present in fa-belongs to fa, but exactly one quadratic term of fa is missing in fa. Let e b e t h e edge corresponding t o t h e missing term. Case 2.1: e belongs to S and e = Then in fa there must b e two terms So fa has t h e degenerate bicycle [£i,£i]5[£2,£t]/^-

^2,€2^1-

Case 2.2: e belongs to S and e ^ [ £ 1 , ^ 2 ] - Let e = i > 2. T h e n t h e quadratic nenhave r been s , term r6 -o1 & n must created via an exchange. Hence fa contains and either £t-i &• l * * case, Gfo has t h e scooter [fi,fi]S (£i,£i_i)[£»_i,6_i]; in t h e second case Gfo has t h e degenerate bicycle £i]S(£i,£t)p. Case 2.3: e belongs to the (nondegenerate) blossom P and it is not incident to the tip £t of p. Let e = [ f j - x , ^ ] . Then fa contains and either or £j. I n t hi enfirste t case G4* contains t h e scooter [(J.I^J^IMJ-U^-IMJ-I^J-TI '' -Jft+ift] 5 second case G^ contains t h e scooter ' * • [£tM£t]5Case 2.4: e belongs to P and is incident to (t. Let e be, say, ( t h e case when m £et or s n eh e clinear o nt e r m e = [ £ v, f t] is similar). Then fa has either t h e linear term t t In t h e first case, G^ contains t h e scooter Kudl^Kt^tl* d one G^ has t h e scooter [6+i»6+i]Kt+i»?t+i][?t+i»&+2] * * * Km&]SKi»6]Case 3: Both a linear term and a quadratic term of fa are missing in fa while all remaining terms of fa are present in fa. Let t h e linear term b e £1. T h e n t h e missing quadratic term must necessarily b e £ ^ 2 - I t follows t h a t £i£2a n d £ 2are present in fa. Therefore, we can reason as in Case 2.1. Since all cases have been covered, t h e theorem is proved. • Theorem 4.1 leads t o t h e following algorithm for squeezing out constants through a sequence of fusions and exchanges. ALGORITHM SQUEEZE 1 I n p u t : Q N P fa

l l

33 S t e p 0. Set &i : = 0, = ^o; build t h e matched graph G+. S t e p 1. If there exists no degenerate bicycle in G+, then stop; o u t p u t (ki,). Else go t o Step 2. S t e p 2. Choose a degenerate bicycle B of G+. Let A be t h e capacity of B. Case 1: P contains

two loops (scooter).

Subtract A from t h e weight of all t h e edges of B, including t h e two loops. If [a, p] is a passive edge of B, add A to t h e weight of edge [a,/?]. (If edge [a,/?] does not exist, create it and give it weight A ) . Delete all the edges with weight 0. Let ki := ki + A . Go to Step 2. Case 2: P contains

one Joop(monocycle)

Subtract 2A from t h e weight of all edges of t h e stem of B and from t h e weight of its unique loop. Subtract A from t h e weight of all remaining edges 7of B. If [a,/?], a ^ ; is a passive edge of B whose weight has been decreased by A ' ( A = 2A or A ) , a d d A to t h e weight of edge [a,/?]. If [a,/?] does not exist, create it and give it weight A ' . Delete all t h e edges with weight 0. Let ki := ki + A . Go to Step 2. END R e m a r k 4 . 2 : In Step 2, it is assumed t h a t one can identify a degenerate bicycle of G+ when such a structure exists. This task can be carried out by using t h e node labelling algorithm of Bourjolly, H a m m e r and Simeone (1984) as a subroutine.

5. Equivalence between reducibility and squeezability For a given i n p u t Q N P ) b e any translation of t h e Q N P . Let L and Q be t h e linear and t h e quadratic part of respectively, so t h a t = c + L + Q. Let (c', V>') be any other translation of and let U and Q' be the linear and t h e quadratic part of respectively. L e m m a 5 . 1 : One must have (i) w(Q) = w(Q') (ii) 2c + w(L) = 2c' -I-

w(V)

Proof: Simeone (1979), has shown t h a t all translations of a given Q N P can be obtained

34 by carrying out t h e following two-stage procedure: Stage 1: For each quadratic t e r m qfy of T h e n , making use of t h e identity

choose an arbitrary A in t h e interval 0 < A < q.

QZR, = { Q - \ ) i r , + \tr}

+ \i

+ A,

-

A

(5.1)

replace q£r) by t h e r.h.s. of such identity. Stage 2: As long as there are linear terms of t h e form a( and 6£, merge t h e m into (a+&)£. As long as there are linear terms of t h e form a( and 6£, where, say, a < 6, perform t h e fusion a + (b - a)( a( + b( (5.2) Remark t h a t in Stage 1, every time one replaces t h e l.h.s by t h e r.h.s. of (5.1), t h e weight of t h e linear part increases by 2A and t h e constant decreases by A, while t h e weight of t h e quadratic part remains constant. Also, throughout Stage 2, t h e quadratic part is unchanged and, every time a fusion (5.2) occurs, t h e constant increases by a, while t h e weight of t h e linear part decreases by (a + 6) — (6 — a) = 2a. Combining these two remarks, we obtain t h e s t a t e m e n t . • L e m m a 5.1 implies t h a t any increase in t h e constant can only be achieved at t h e expenses of t h e weight of t h e linear p a r t , a fact t h a t was already mentioned in t h e discussion of t h e example in Section 3. L e m m a 5.2 If is a purely quadratic normal posiform, then is irreducible. P r o o f : Follows immediately from L e m m a 5.1 T h e o r e m 5.3 A Q N P is reducible if and only if it is squeezable. Proof: T h e " i f p a r t is obvious. Let us prove t h e "only i f part by induction on t h e number n of variables, (n > 2): If n = 2, a Q N P in two variables can always be written in t h e general form q(rj + r£r) + a( + bli+cn where a, b,c,d>

+ dfj,

0, ab = cd = 0, q, r > 0.

Assume t h a t is reducible. T h e n has a translation (k,rp) where k > 0. Such translation can be obtained via t h e two-stage procedure described in t h e proof of l e m m a 5.1. After Stage 1, one has for some A, fi (0 < A < q, 0 < p < r ) : k+il>

= (q-X+n)tTl+(r-/i+A)^H-(a-j-A)77

+ (6H-^)?+(c-fA)77+(ci-f^)?--A--/i

(5.3)

and after Stage 2 k = min{a + A, 6 + /*} + min{c + A,d + / i } - A - / i

(5.4)

Since is reducible , 4> must have at least one linear term by L e m m a 5.2. So, without loss of generality, assume t h a t a > 0. T h e n 6 = 0.

35 Claim 1: c > 0. Assume t h a t c = 0 „ then (5.4) would become 0 < k = min{a + A, /*} + min{ A, d - f / x } - A - / x < / x - f A - A - / x = 0, a contradiction. Hence c > 0 a n d thus 0. Assume t h a t r = 0. Then one would have ft = 0, a n d (5.4) would become 0 < A: = min{a + A, 0} + min{c + A, 0} - A = - A < 0 a contradiction. Hence r > 0. In conclusion, contains t h e terms £, rj and £77 a n d hence is squeezable. Now let us prove t h e induction step from n — 1 t o n variables. So assume t h a t is a reducible Q N P in n variables. B y L e m m a 5.2, must contain a linear term, say x,- ( t h e case x~i is similar). Write $ = (a + A)x{ + Bxi + C, where A, J9, C are posiforms in t h e n — 1 variables Xj_i, • • • n;, xA a n d 5 are linear, C is quadratic, a n d a is a positive constant. If C is squeezable, then is also squeezable. Hence from now on we may assume t h a t C is not squeezable. Since is reducible, there are A; > 0 a n d a Q N P ^ such t h a t (a + A)xi + Bxi + C = k + ^

(5.5)

Setting x< = 0 in (5.5) we obtain (5.6)

B + C = k + ip'

where V>' is a Q N P in t h e n — 1 variables xi, • • • , Xj_i, x»+i, • • n.• Identity ,x (5.6) implies t h a t t h e Q N P B + C is reducible and hence, by t h e inductive hypothesis, it is squeezable. After Theorem 4 . 1 , there are two possible cases. Case 1: B + C contains

a sequence of terms of the form 6,?i6.--->?t-i&,?t

Case 2: B + C contains 6>?i6>

(scooter)

(5.7)

a sequence of terms of the form ** *

i?t6+i» *

*' >£,-i&»£t,£t

(monocycle)

(5.8)

In b o t h cases, all quadratic terms of t h e sequence belong to C , while a t least one linear term is contained in B , else C would b e squeezable. Let us examine Case 1. Case 1.1: £1 is contained

in B and £t is contained in C.

Then (f> = (a + A)x{ + Bx~i + C contains t h e sequence

36 Hence G£»], and having a blossom with tip 2f». Thus is squeezable by Theorem 4.1. Finally, let us consider Case 2. Now £i must belong to B. Thus contains the sequence

such sequence corresponds again to a monocycle of G and thus is squeezable by Theorem 4.1 • C o r o l l a r y 5.4: If (c, ip) is a translation of the Q N P with V> irreducible, then c = h((j>). P r o o f : For some Q N P

one has = c + = h() + il>\

Clearly c < h((j>). Suppose t h a t c < h(). Then from (5.9) one would obtain ^ = k +

with

k = h((t>) - c> 0

Hence ^ would be reducible, a contradiction. • Now it is easy to obtain the main result of this section. T h e o r e m 5.5: If t h e output of S Q U E E Z E 1 is (c*,^*), then one has c* = h(). Hence c* does not depend on the particular sequence of fusions and exchanges performed by S Q U E E Z E 1. P r o o f : T h e Q N P r/>* is obviously unsqueezable and hence irreducible by Theorem 5.3. Then the statement follows from Corollary 5.4 • R e m a r k 5 . 1 : In a preliminary version of this paper (Bourjolly, Hammer, Pulleyblank and Simeone, 1989), we gave a completely different proof of Theorem 5.5 based on linear programming. Actually, we have shown t h a t finding h() can be formulated as a fractional 6-matching problem, whose solution can be obtained via a specialized version of the simplex algorithm. Then we have shown t h a t each pivoting can be simulated by

(5.9)

37 a sequence of fusions and exchanges, thereby obtaining a constructive proof of Theorem 5.5. Algorithm S Q U E E Z E 1 has a polynomial time implementation. Assume t h a t (after possible scaling) all t h e coefficients of t h e input quadratic posiform are integers, and let D be t h e m a x i m u m such coefficient. According to Lemma 5.1, one has h() < \nD. At each iteration, finding a degenerate bicycle or verifying t h a t none exists takes 0(m) time, where m is t h e number of terms. Taking into account these observations, it is possible to show t h a t , if one makes use of a scaling technique similar to t h e one described in Edmonds and K a r p (1972), and if at each iteration one selects a degenerate bicycle in a suitable way, t h e n S Q U E E Z E 1 runs in 0(mnlog(l + D)) time. (For details, see BourjoUy, H a m m e r , Pulleyblank and Simeone, 1989). Algorithm S Q U E E Z E 1 might not be as fast as other existing algorithms for solving t h e roof-dual. Actually, t h e general version of S Q U E E Z E 1 (without scaling) described above is equivalent to Ford-Fulkerson's algorithm for the max-flow formulation of t h e roof-dual given in Boros, Hammer and Sun (1991). T h e interest of S Q U E E Z E 1 is mainly due to t h e fact t h a t this algorithm solves t h e roof-dual via a sequence of two simple boolean operations. As a consequence, one can improve t h e roof-duality b o u n d by using two further elementary boolean operations, as shown in t h e next section.

6. Improving the lower bound by consensus and condensations In this section we shall give a combinatorial characterization of those Q N P from which one can squeeze out a positive constant by a sequence of t h e four elementary boolean operations introduced in Section 3. We shall also describe an algorithm, S Q U E E Z E 2, relying on this characterization, and prove t h a t , whenever the lower bound obtained by S Q U E E Z E 1 is smaller t h a n t h e minimum of , one can strictly improve this b o u n d by using S Q U E E Z E 2. D e f i n i t i o n : A Q N P is said to be weakly squeezable if there exists a sequence of quadratic posiforms fa, fa, • • • , fa, with s > 1 such t h a t (i)

fa = fa

(ii) fa (i — 1,2, -",s) is obtained from fa_i by a single exchange, consensus or condensation (possibly after some split); (iii) fa is ripe. In this case, fa, fa,..., fa is called a weakly squeezing sequence for (of length s). D e f i n i t i o n : Let be a purely quadratic posiform. is called L-squeezable a sequence fa,fa,-",fa, — -,fa (s > 2) of posiforms such t h a t : (i)

fa = 4',

if there exists

38 (ii) For all i = 1, • • • , s — 1, after some split);

is obtained from

through a single consensus (possibly

(iii) 9is obtained from through a condensation (possibly after some split). (Therefore, contains a linear term, while ^o, & , • • • , a

-i do n o t ) . L e m m a 6 . 1 : Let ^ be a purely quadratic normal posiform such t h a t G has a blossom of length at least 5. T h e n is L-squeezable. Proof: If G has a blossom of length at least 5, then contains a sequence of terms of the form 66> £26*"' •»ft-i6 » 66It is easy to see t h a t is L-squeezable: just perform the consensus

followed by the condensation +

(6.2)

L e m m a 6.2: Let $ be a quadratic posiform such t h a t G$ contains a bicycle. If ij> is obtained from t h e quadratic posiform through a single consensus, condensation, or exchange (possibly after some split), then contains a bicycle. P r o o f : Let a be the elementary boolean operation which (possibly following some split) transforms the posiform into ty. Case 1: a is a

consensus.

Then there is a t e r m £77 of ip which is obtained by consensus from two terms and 7 £ different from 6 £, 77, 77. Thus Gj, contains the alternating 3-path £77 of , with [fjCHCjCHC?/] (while t h e edge [£,77] may be present or not in G+). Therefore for every alternating p a t h P of G^ which contains the edge [£,tj], there is an alternating p a t h P' in G having the same endpoints as P , not containing the edge [i^], and such t h a t the lengths of P and P' have the same parity. It follows that G has a bicycle. Case 2: a is a

condensation.

In this case, there exists a linear term £ of ^ which is obtained by condensation from two terms £77 and £fj of . If the bicycle B of G^j, does not contain the loop at vertex £, then clearly B is a bicycle of G4. If, on the other hand, B is a degenerate bicycle having a loop at £, then one obtains a bicycle of G4 by replacing such loop by the blossom of length 3 Case 3: cr is an

exchange.

Let Gi/, contain the bicycle B with stem S and blossoms B\ and B2. Let £1 and £2 be the tips of B\ and B2, respectively. We may assume that B has the property t h a t no

39 edge of S belongs also t o b o t h B\ and B , else one can always find another bicycle of 2 Gj, having such property. Let t h e exchange a be given by rj + £77 b e t h e purely quadratic normal posiform whose terms correspond t o t h e passive edges of B\. Let £ b e t h e t i p of B\, and let t b e t h e number of passive edges of B\. T h e n V> is L-squeezable a n d one can produce t h e linear t e r m f by performing a sequence of t — 2 consensus followed by one condensation, Notice t h a t , if ^o, ^i, • • • , t-i, t is t h e sequence of quadratic posiforms obtained by these t — 1 elementary boolean operations, t h e posiforms ^o>-">*) be t h e o u t p u t of S Q U E E Z E 1;

c** : = c** + c*; : = V>*; { C o m m e n t : at this point G+ has been u p d a t e d by S Q U E E Z E 1} S t e p 2. If (?^ contains no bicycle, then S T O P : o u t p u t (c**,^); Else let B b e a bicycle of G^; { C o m m e n t : B must b e a nondegenerate bicycle}. Go to Step 3 S t e p 3 . If necessary, replace some of t h e edges of B by multiple edges as previously explained; let k b e t h e capacity of B. Let B be one of the two blossoms of B] let £ be t h e tip of B. Subtract k from t h e weight of every passive edge along B\ if t h e weight of any such edge becomes 0, t h e n delete t h e edge; Add a loop of weight k at ( . U p d a t e G accordingly; go to Step 1. END R e m a r k 6 . 1 : It is possible to show t h a t also S Q U E E Z E 2 can be implemented so as to run in polynomial time. One can find a nondegenerate bicycle by applying t h e labelling algorithm of Deming (1979) or Bourjolly, Hammer and Simeone (1984) as a subroutine. T h e o r e m 6.4: Let c* and c** be t h e constants o u t p u t by S Q U E E Z E 1 a n d S Q U E E Z E 2, respectively, and let z* = m i n B » fa If c* < z*, then c* < c** < z*.

x€

Proof: Let (c*, V>) be t h e o u t p u t of S Q U E E Z E 1, and let $ be t h e boolean frame of As noticed in Section 1, if c* < z* t h e quadratic boolean equation ^ = 0 has no solution. Prom t h e results of Deming (1979), Sterboul (1979), Simeone (1985), it follows t h a t G+ has a bicycle. Such bicycle cannot be degenerate, otherwise rj> would be squeezable. Algorithm S Q U E E Z E 2, using this nondegenerate bicycle, will squeeze out from ^ a positive constant. Hence c* < c**.D

REFERENCES Boros, E., P.L. H a m m e r and X. Sun (1991): Network flows and minimization of quadratic pseudo-boolean functions, R R R # 17-91, R U T C O R , Rutgers University, New Brunswick. Bourjolly, J-M., P.L. H a m m e r , W . R . Pulleyblank and B . Simeone (1989): Combinatorial m e t h o d s for bounding quadratic pseudo-boolean functions, R R R # 2 7 - 8 9 , R U T C O R , Rutgers University, New Brunswick.

42 Bourjolly, J.-M., P.L. H a m m e r and B . Simeone (1984): Node-weighted graphs having t h e Konig-Egervary property, Mathematical Programming Study 2 2 , 44-63. Deming, R . W . (1979): Independence numbers of graphs - an extension of t h e KonigEgervary property, Discrete Mathematics 27, 23-33. Garey, M.R., D.S. Johnson and L. Stockmeyer (1976): Some simplified NP-complete graph problems, Theoretical Computer Science 1, 237-267. Hammer, P.L., P. Hansen and B . Simeone (1984): Roof-duality, complementation and persistency in quadratic 0-1 optimization, Math. Programming 2 8 121-155. Hammer, P.L. and B . Kalantari (1989): A bound on the roof-duality gap, in: B . Simeone (ed.) Combinatorial Optimization, Lecture Notes in Mathematics, vol 1403, 254-257. Hammer, P.L., and B . Simeone (1989): Quadratic functions of binary variables, in: B . Simeone (ed.) Combinatorial Optimization, Lecture Notes in M a t h e m a t i c s , vol. 1403 (Springer, Berlin), 1-56. Hansen, P., S.H. Lu and B . Simeone (1990): On t h e equivalence of paved duality and standard linearization in nonlinear 0-1 optimization, Discrete Applied Mathematics 29, 187-193. Hansen, P. and B . J a u m a r d (1987): Algorithms for the m a x i m u m satisfiability problem, R R R # 4 3 - 8 7 , R U T C O R , Rutgers University, New Brunswick. J a u m a r d , B . , and B . Simeone (1987): On the complexity of the m a x i m u m satisfiability problem for Horn formulas, Information Processing Letters 26, 1-4. Lu, S.H. and A.C. Williams (1987): Roof-duality for polynomial 0-1 optimization, Math. Programming 3 7 , 357-360. Simeone, B . (1979): Quadratic 0-1 programming, boolean functions and graphs, P h . D . Thesis, University of Waterloo. Simeone, B . (1985): Consistency of quadratic boolean equations and t h e Konig-Egervary property for graphs, Annals of Discrete Mathematics 2 5 , 281-290. Sterboul, F . (1979): A characterization of t h e graphs in which t h e transversal n u m b e r equals t h e matching number, Journal of Combinatorial Theory, Series B , 27, 228229.

NAVAL P E R S O N N E L A S S I G N M E N T : AN A P P L I C A T I O N O F L I N E A R - Q U A D R A T I C P E N A L T Y M E T H O D S

Mustafa Q. Pinar and Stavros A. Zenios Decision Sciences D e p a r t m e n t T h e W h a r t o n School of t h e University of Pennsylvania Philadelphia, PA 19104

ABSTRACT T h e problem of optimally (re)allocating Navy personnel to permanent stations is compounded by several considerations: budgetary requirements, staffing of positions by occupation groups or ranks, and maintaining an acceptable level of readiness. T h e problem can b e formulated as a transportation problem with side constraints. An additional, non-network, variable measures t h e readiness level. However t h e resulting m a t h e m a t ical programs are very large - up to 66,000 variables and 36,000 constraints including 5,400 non-network inequalities. In this paper we report on an application of t h e LinearQuadratic Penalty (LQP) method to solve this large scale problem. It is therefore possible t o exploit t h e structure of t h e embedded transportation problem. T h e algorithm solves efficiently, and t o a high degree of accuracy, models t h a t would not be solved with a general purpose solver. Hence, t h e model can be used for strategic planning decisions. Further work on t h e CRAY Y - M P supercomputer illustrates t h e use of vector computers for solving t h e Naval personnel scheduling problem in a way t h a t makes it useful for operational planning purposes. KEYWORDS Naval Personnel Assignment. Network Optimization. Smooth Penalty Methods. Vector Computing. INTRODUCTION T h e use of specialized algorithms can have a significant impact on t h e solution process of large scale optimization models. This is particularly t r u e for models which are used for operational planning purposes where solutions need to be produced in a timely and cost-effective way. T h e use of special purpose algorithms could make t h e difference between a conceptual model and one t h a t is routinely used to aid t h e decision making process. Even for strategic planning applications, however, mathematical programming has evolved into a powerful modeling tool in various application areas such as transportation, logistics and military planning among many others. These models can be solved using general purpose mathematical programming packages. However, these models tend to be very large in general and require excessive amount of t i m e and storage during the solution process. In such cases, it may be beneficial to exploit any structure present in 43

44 t h e model to produce a solution within reasonable time and memory usage. This strategy has been followed in t h e use of network structures for planning purposes since the early days of linear programming. See for instance t h e survey papers by Glover et al. (1990) and by Dembo et al. (1989). In this paper we describe a military planning model which was formulated as a very large linear program with an embedded network structure. T h e problem is solved by a specialization of a smooth penalty algorithm due to t h e authors. T h e model described here is a Navy personnel assignment model and belongs to a class of optimization problems known as networks with side constraints. T h e presence of network structure in mathematical programming models can greatly speed up t h e solution process through t h e use of specialized network optimization techniques and d a t a structures which have been developed over t h e past two decades. However, t h e task is more complicated when there are additional, i.e. non-network, constraints in t h e model. These drastically diminish t h e advantages offered by the network structure. In such cases, the operations researcher is compelled to design special solution techniques t h a t would expose t h e network structure and thus gain access to network optimization tools. One such specialized algorithm is the Linear-Quadratic Penalty (LQP) algorithm of Zenios et al. (1990). T h e algorithm was used to solve very large planning models for the Military Airlift C o m m a n d with multicommodity network flow structure which can be seen as a special case of t h e network with side constraints model. This paper is organized as follows. We proceed with a description of t h e Naval personnel assignment problem and the associated network formulation. A more detailed account can be found in Krass and Thompson (1990). Then we specialize t h e ( L Q P ) algorithm to t h e Naval personnel assignment model, followed by an account of solution strategies and numerical results. Particular emphasis is placed on t h e efficient use of vector computing on a CRAY Y - M P . We also provide comparisons with t h e general purpose linear program solver MINOS of Murtagh and Saunders (1987). We conclude t h e paper with general comments and discussion. T H E NAVAL P E R S O N N E L A S S I G N M E N T P R O B L E M Each year thousands of decisions are made to (re)allocate t h e Navy Enlisted Personnel to a fleet of combat units and to mission areas within these units. Allocations are m a d e in such a manner as to provide the best defense at the lowest cost. A combat unit is characterized by a number of mission areas according to the function fulfilled by each mission area such as mobility, anti-air warfare, submarine warfare etc. A mission area within a unit requires personnel with different skills to support operational capabilities. T h e personnel skills can be determined by ranks, pay grades and the Navy Enlisted Classifications. A unit's capability to perform its functions in all its mission areas is referred to as "readiness". Manning is defined as the percentage of personnel to manpower requirements of a mission area. Readiness is measured based on t h e manning level of a mission area. A shortage of skilled personnel would decrease t h e level of readiness of a mission area and thus degrade the capabilities of t h e unit. Clearly maximizing t h e level of readiness is a complex decision making problem given the large number of mission areas and personnel to be matched. Several researchers have studied this problem and suggested models and solution techniques. Network models for general military personnel planning have been proposed by Klingman et al. (1984), and for t h e Navy problem by Ali et al. (1988). In this paper we are particularly interested in the Navy personnel planning problem, under considerations of readiness. Quantifying t h e level of readiness of a unit as a function of the personnel assigned to the unit becomes a crucial first task in t h e decision making process. A continuous measure of readiness was recently developed in Krass et al. (1988).

45 They also present a heuristic t h a t computes the number of personnel moves required to achieve a given level of readiness for each unit. Classifying t h e personnel in two categories they measure t h e readiness level R for a mission area as: £ = 10-

min[(a; + 5),y]

(1)

10

where x is t h e percentage of personnel from category A and y is t h e percentage of personnel from b o t h category B . Smaller values of R indicate a high level of readiness for a mission area. T h e readiness level for a unit is defined as t h e m i n i m u m of the readiness levels of all mission areas contained in t h a t unit. I.e., t h e m a x i m u m value of R among all mission areas of a unit determine t h e level of readiness of t h e unit. Therefore given t h e number of Navy personnel from each category assigned to a unit u and to each mission area m within unit u, t h e readiness level of t h e unit can be computed as Ru = m a x Run

(2)

where Rum is obtained from (1). T h e problem of assigning personnel t o units and t o mission areas within units ignoring readiness considerations can naturally be modeled as a network optimization problem, see Krass and Thompson (1990). An example is depicted in Fig. 1. Personnel Categories

Combat Units

Mission Areas

Fig. 1. T h e structure of t h e network for Naval Personnel Assignment We sketch here t h e network underlying t h e optimization model and t h e additional requirements t h a t form t h e non-network constraints. T h e network consists of three layers: L a y e r 1: Navy personnel divided into categories according to their skills, rank and pay grades. Each such category constitutes a supply node, with supply equal to t h e number of personnel available for assignment in each category. These nodes are connected t o second layer nodes.

46 L a y e r 2: T h e Combat units. T h e r e is a node for each unit to which personnel are to be assigned. These are transshipment nodes, i.e., they have no exogenous supply or demand. These nodes are connected to third layer nodes. L a y e r 3 : T h e Mission areas. There is a node for every mission area of each unit. These nodes have supply equal t o t h e number of personnel already on board a given unit mission area. Finally these nodes are connected to a sink node with demand equal to total personnel supply. T h e network can be seen as a tripartite graph. T h e model takes into account t h e following additional considerations: •

A person assigned t o a unit may or may not b e counted toward t h e readiness level calculation of some mission areas within t h e unit. Clearly if a person cannot perform some function critical to t h e mission area then his or her contribution t o t h e readiness level for t h a t area is nil. This requirement depends on the problem d a t a and can be modeled using side constraints.



T h e readiness level computation is non-linear in t h e flow variables since it involves the "min" operator, see equation (2). This computation can be reformulated using linear inequalities and posed as non-network constraints. This reformulation is given below.

T h e above requirements are cast into side constraints as follows. Let x

rpu be t h e number of personnel from pay grade p with skill r assigned to unit u, the number of personnel from pay grade p assigned to unit u t h a t can be employed pum

Y

in mission area m .

$urm a parameter which is equal t o 1 if a person with skill r assigned t o unit u can be employed in mission area m and zero otherwise. This parameter is specified as part of t h e problem data. PB^

m t h e number of personnel from pay grade p already on board unit u and t h a t can be employed in mission area m.

B?

m t h e total number of personnel from pay grade p t h a t can be on board unit u and mission area m .

T h e parameters PB^ and B* are also specified as problem data. T h e decision varim m ables x and t h e variables Y are related by t h e following relation:

rpu

pum x y



pum —

(3)

Assuming there are only two pay grades, i.e., t h e value of p is either 1 or 2, the readiness level R for unit u is computed as:

u

Y\um + PBI um + .05), Y2um + -P-Bui um•)) ) Ru = 10 • (1 — min(min((

(4)

47 T h e n t h e problem of maximizing fleet readiness is t o find t h e fleet readiness level R such t h a t R is a solution t o Minimize

R = m a x Ru

u

Subject to:

Flow conservation constraints on

x

rpu

This problem can b e reformulated by observing t h a t if L is a feasible readiness level then V u

L>Ru This condition is equivalent to

y

P i ?

L > 10 • (1 - m m ( m i n ( ( ' ™ +

Y

" " ' + .05),

uP mB }

p ~)))

(5)

um

Defining z = 1 — ~ , then (5) can b e rewritten as:

y i * < imn(mm(( ~ +

fL *

+ .05),

( )

6

or, equivalently, t h e readiness level I, is attainable if

m> ( Z- . 0 5 ) B L - ^ L

F i „

(7)

and (8)

Y >zBl -PBl

ium m m

Using (3) t h e conditions for measuring readiness level are equivalent to £

*«.x

r £

r

r . l>(z-

*»»*r2.

.05) • Bl - PB\

m

>*-BL~

(9)

m

(10)

PBlm

Rearranging t e r m s , t h e side constraints have the following form:

S

Yl ^rmXrpu-

V

CtZ>T

U,p,m

(11)

r

where t h e scalars a and r are computed from (9) and (10). Therefore, t h e Naval personnel assignment model can be formulated as: Maximize Subject to:

s

z

52 urmXrpu

V

— ctz>r

r

+

u,p,m

Flow conservation constraints on Expressed in m a t r i x form: [NETSBDE] minimize x,z subject to

—z Ax Sx + Pz 0 < x 0 < z

= > <
c where t is a scalar real variable and € is a positive real number. T h e function is depicted in Fig. 2. T h e linear-quadratic penalty function is used to eliminate t h e side constraints by placing those in t h e objective function. T h e nonlinear network problem obtained by penalizing t h e side constraints Sx + Pz > r is formulated as:

49

where y = r — Sx — Pz, t h e linear-quadratic penalty function is given by (12) and fi is a positive scalar which determines t h e severity of t h e penalty. T h e resulting nonlinear network problem is solved repeatedly with adaptively changing parameters fi and e until suitable stopping criteria are satisfied. T h e algorithm can be concisely stated as follows: T h e Linear—Quadratic Penalty Algorithm

S t e p 0 (Initialization.) Find an initial feasible solution for t h e network component of Naval Personnel Assignment problem ignoring t h e side constraints, i.e., solve t h e problem minimize _

z

subject to

Ax = 6 0 < x < u 0 < z < v

If t h e solution t o this problem satisfies all side constraints, stop. Otherwise choose initial values for penalty parameters fi and e and go t o Step 1. S t e p 1 Solve t h e nonlinear network problem N L P . Go to Step 2. S t e p 2 If t h e solution satisfies optimality criteria, stop. Otherwise, adjust t h e penalty parameters fi and e and go t o Step 1. T h e solution of t h e nonlinear network problem in Step 1 demands t h e most computational effort. This problem is solved using t h e network specialized version of simplicial decomposition algorithm (Mulvey et a/., 1990). Simplicial decomposition iterates by solving a sequence of linearized subproblems to generate e x t r e m e points of t h e feasible region of t h e network component, a n d master problems which minimize t h e nonlinear objective function over t h e simplex spanned by t h e e x t r e m e points. A complete description is given in t h e Appendix. Here we discuss t h e specializaton of t h e algorithm for t h e Naval personnel assignment model. T h e Subproblem. A new vertex (x, z) is generated as t h e solution to t h e following subproblem: T v T Minimize

x V $(x\

x

z ) + z V ${x\

subject to Ax = b 0 < x < u 0 < z < v

z

z")

50

uu

where (x ,z ) is t h e iterate at t h e i/-th iteration of simplicial decomposition and V $ x and V $ denote t h e gradient with respect to x and z respectively. This problem natu2 rally decomposes into two independent linearT programs as follows: u Minimize

x V $(x",

z)

x

subject t o

Ax = b 0 < x < u

u

and Minimize

z

V $(x ,z")

z

subject to

0 < z < v T h e first problem is a linear network problem and is solved using the network simplex u assigning u method. T h e second problem is solved trivially by z to its lower or upper bound depending on t h e sign of t h e gradient S7 ^(x , z ).

z

T h e Master Problem. A nonlinear master problem optimizes t h e objective function on t h e simplex specified by t h e extreme points generated by t h e subproblems. T h e master problem is formulated in t h e form: Minimize t w to subject

7

$(Bw)

i=l

v

W{ > 0 i = 1 , . . . , v

2 vB where v is t h e number of extreme points generated by t h e solving t h e subproblems, is the matrix whose columns are t h e generated extreme points and w — [ t o , w ,..., w] are t h e corresponding weights. T h e master problem, though nonlinear, is of significantly smaller size t h a n the original problem since it is posed as a problem over t h e weights w. There are several standard methods t h a t can be used for its solution, like, for example, Bertsekas's projected Newton method (Bertsekas, 1982). If t h e simplicial decomposition algorithm drops vertices t h a t carry zero weight at t h e optimal solution of t h e master problem, t h e n subsequent master programs are locally unconstrained. Hence, methods of unconstrained optimization can b e used to compute a descent direction. A simple ratio test determines the m a x i m u m feasible step length t h a t will not violate the bounds. T h e master program can be rewritten in the form: (13)

mm$(Dw)

where D = [yi — y \y2 — y \ • • • \Vv-i — Vv] is the derived linear basis for the simplex v v generated by the vertices y j/ » • • • >2/v We denote by w t h e vector [wi,u? 1? 2 2,..., u>v-i] and t h e solution for w is computed as

v

v-l w

v

= I - J 2 t=i

w (14)

i

u At the current iteration we have v — 1 active vertices (i.e. w > 0, for i = 1 , . . . , v —u 1) x and the last vertex y lies along a direction of descent. Hence, given an iterate (x ,z ) v a descent direction p to (13) can be obtained as t h e solution to T

(D MD)p

T

=

uu

-D V$(x ,z ),

(15)

51 T h e choice of t h e m a t r i x M and alternative solution m e t h o d s for system (15) are discussed in Zenios et al (1990).

Adjusting t h e Penalty P a r a m e t e r s T h e procedure used t o u p d a t e t h e penalty p a r a m e t e r s fi and e consist of dynamically k tolerance k k and increasing t h e value of fi when decreasing t h e value of c t o a small kfinal k (x ,z )k fi , e are given at iteration k of the L Q P certain criteria are mk e t . Suppose y k t h e set V(x,z) = {j\yj > t] to be algorithm. Also let y = r — Sx — Pz and kdefine k k T h e iterate (x , z ) is termed t-feasible if t h e index set t h e set of violated constraints. of violated constraints V(x , z ) is empty. We distinguish between t h e following two cases when u p d a t i n g t h e penalty parameters:

kk C a s e 1: If V(x ,z ) = 0, this is an indication t h a t t h e m a g n i t u d e of t h e penalty parameter fi was adequate in t h e previous iteration since e-feasibility is achieved. In this case t h e infeasibility tolerance e may be reduced.

kk C a s e 2: If V(x z ) ^ 0, t h e current point is not e-feasible, an indication t h a t t h e i k penalty p a r a m e t e r fi should be increased. One possible choice is to do so proportionately to t h e degree of infeasibility. Let 7 = rje be a target degree of infeasibility where rj G ( 0 , 1 ] . We consider t h e following u p d a t e equation:

+

/

1

= / & 7

(16)

+ ,

or equivalent ly, ,.*

And considering | V ( z * , z * ) | > 1, we get

= / - g f .K

(17)

7Jt

k l

fi + = ^

m a x yj.

(18)

In s u m m a r y we have t h e following u p d a t e procedure: Pick //J, 772 € (0,1] k 1+ If V(s*,**) = 0 e

Else

=

k

ma,x{e ,Tjit }

=

m a x ^

min

jev(**,z*)yj-

where e mn tis a suitable final feasibility tolerance. A suitable initial value for fi can be found through some preliminary experimentation. T h e solution to (x°, z°) obtained by ignoring t h e side constraints can be used to obtain an initial value for e. A reasonable choice is to pick a value equal to a fraction of t h e m a x i m u m of t h e side constraint violations, i.e., in t h e interval (0, maxj v(x°,*°) 2/j)- T h e value of parameters rji and 772 e was taken to be 0.5 for all computational tests reported in this study.

52 N U M E R I C A L RESULTS T h e L Q P algorithm was implemented for the case of t h e problem N E T S I D E . T h e code was written in Fortran. We refer t o t h e code specialized for t h e Navy assignment problems as t h e G E N O S / L P system. In this section we report numerical results obtained using G E N O S / L P on two Naval Personnel Assignment problems. T h e first model — NAVY — is a simplified version of t h e complete model which we we call H U G E N A V Y . T h e size and characteristics of both problems are given in Table 1. Both problems have one side (non-network) variable which represents t h e readiness measure. Table. 1.

Characteristics of Test Problems.

Problem NAVY

HUGENAVY

Rows 4144 36013

LP

Columns 6842 64542

JNetwork Nodes Arcs 3457 6841 30639 64541

N u m b e r of side constraints 687 5374

Solution Strategies Before we give t h e solution statistics for t h e H U G E N A V Y problem we mention two particularly i m p o r t a n t components of t h e G E N O S / L P system. Initial Feasible Solutions and Stopping Criteria. It is possible to c o m p u t e lower bounds t o t h e optimal objective function value during t h e course of t h e L Q P algorithm. This computation is performed after the subproblem phase of t h e simplicial decomposition and is based on a first order Taylor series expansion of t h e function $ around t h e current iterate. Let v* be t h e optimal value of [ N E T S I D E ] and x^ an optimal solution of [ N L P ] for t given penalty parameters fi and e and x denotes t h e pair (x, z) for notational simplicity. Also, let X = {x\Ax = 6 , 0 < x < t x , 0 < z < v}. Then (19)

< v\

since [ N L P ] is a relaxation of [ N E T S I D E ] . Therefore t h e optimal solution of [ N L P ] is a lower bound for t h e optimal objective value. But in t h e presence of inexact minimizations of t h e penalized objective function $ , this is not always guaranteed to be a lower bound. Hence, we consider t h e first order Taylor series expansion of $ around a point y

2

* ( x ) = * ( y ) + ( x - y f V $ ( y ) + o(||y|| ) and ignoring t h e second order t e r m define the function,

T

(20)

h,

fc(x) = * ( y ) + ( x - y ) V * ( y )

(21)

By convexity of m i n x h(x) < $ ( £ ) , and hence, m i n x h(x) < v*. This bound x € M ? C x € is readily computed by t h e simplicial decomposition algorithm t h a t generates extreme points of X by minimizing a linearized approximation to t h e objective function over X. We now proceed to describe the procedure for generating upper bounds in the linearquadratic penalty algorithm. For a more general discussion on computing bounds in exterior penalty function algorithms see Fiacco and McCormick (1968). Define the set BP = { x € X\Sx + Pz > r } and assume t h a t BP is non-empty. Let x = ( i , z ) 6 i ?

53 and ( x ^ o z^.e) be an — perhaps approximate — optimal solution of [ N L P ] . T h e n a new interior point (x, z) is generated as follows: let y = r — Sx^ — Pz^ , and / = {i\yi > 0 } .

e

P

= max — ^ —

t

(22)

and define x = (1 -

p)x^t

+ #r

(23)

3 = ( 1 - 0 ) ^ + 0*

(24)

It can be shown t h a t (x, z) is feasible for [ N E T S I D E ] and thus provides an upper bound, (Fiacco and McCormick, 1968, Theorem 29, p . 107). T h e same result also states t h a t the upper bound converges monotonically to t h e optimal objective value. Obviously this procedure requires an interior point to be generated at t h e beginning of the algorithm. For example, in Zenios et al. (1990), a solution satisfying t h e m u t u a l capacity constraints of the multicommodity flow problem is computed based on t h e solution to t h e network relaxation. For t h e Naval assignment problems used in this study, an initial feasible solution was readily available since t h e solution to t h e network relaxation satisfied t h e side constraints when t h e side variable was ignored, i.e., let x° be a solution of t h e network relaxation x minimize Ox . subject to Ax = b 0 < x r then z° is computed as

r

z° = min -

cx — z —

5 (x,z) is obtained 2 where x = (x,z) is t h e current iterate and from (24). T h e values of tmin and egapused in this study are 10~ and 3 x 1 0 " respectively. T h e lower bounds generated as a result of t h e procedure described above are not very tight. This explains the larger value of the bound gap on termination of t h e algorithm. T h e improving lower and upper bounds are illustrated in Fig. 3 and Fig. 4 for t h e problems NAVY and H U G E N A V Y respectively. T h e horizontal axis in both figures is the number

54 of executions of Step 2~6f t h e L Q P algorithm. T h e ability to compute improving upper bounds is an important feature of our approach since computation can be stopped as soon as a reasonable improvement in t h e upper bound is achieved.

•2.00 a 16 |



0

1



1

| 2



i



i



3 4 Major Iteration

1

i

S

6

1

'

I

7

Fig. 3. Convergence of lower and upper bounds for NAVY

I

•1. 2 0

>

i 1



1

2



I

3



i 4



r S

Major Iterations

Fig. 4. Convergence of lower and upper bounds for H U G E N A V Y Vector C o m p u t i n g . T h e simplicial decomposition algorithm is particularly rich in dense linear algebra computations which can be efficiently vectorized. We mention here the main components of t h e linear algebra involved in t h e L Q P method.

55 Computing Descent Directions. T h e following system of linear equations is solved to compute a descent direction during t h e course of t h e simplicial decomposition algorithm:

T

T

(D MD)p

= - £ > V $ ( x , z),

(26)

where D is a projection matrix, V $ denotes t h e gradient of t h e objective function, the pair (x,z) is an arbitrary iterate and M is a matrix which usually approximates t h e second derivatives of t h e function $ . T h e matrix D tends t o be very large depending on problem size. Typically for HUGENAVY, D can b e 64542 x N where N is t h e number of extreme points used in t h e master problem solution. N varies from 1 t o 100. T h e T computation of t h e product D MD can be very efficiently vectorized. Function and Gradient Evaluations. Having computed a descent direction, a one dimensional search is performed to compute t h e next iterate. T h e time spent in t h e search procedure is dominated by t h e computation of function values and t h e gradient vector. T h e function and gradient evaluation of t h e original linear objective function can be vectorized trivially as it involves a simple DO-loop over all variables in t h e problem. However, t h e function and gradient values contributed by t h e nonseparable penalty function requires t h e evaluation of t h e side constraints yj = (r — Sx — Pz)j for all side constraints. If side constraint j is satisfied by t h e current iterate ( x , z ) , t h e penalty function a n d derivative values vanish since yj is less t h a n or equal t o zero. Otherwise the penalty t e r m is either t h e quadratic or t h e linear t e r m as given by [ 02

(t,yj) = I % [ Vi-h

if yj < 0 iff

i y * ) \k=l

/

< ! > / ( * * ) * ! > / ( * ' ) = /(*')• k=l

k-1

Preprocessor 1. T h e concept behind this preprocessor is an idea based on t h e discussion in P r e p a r a t a and Shamos (1985, Section 4.1.2). T h e basic idea is t o search for points which are "outliers" along each of t h e TO axes. Simply stated, begin with t h e first component of each of t h e n points and identify t h e one with t h e m a x i m u m value a n d t h e one with t h e m i n i m u m value. W h e n t h e m a x i m u m (minimum) is unique t h e n t h e corresponding point is extreme. In t h e case of a tie, proceed t o another coordinate of t h e points participating in t h e tie and examine it; an e x t r e m u m of t h e other coordinate, when unique, identifies t h e extreme point. Further u one point emerges as extreme. ties are broken by examining different coordinates until This procedure is applied iteratively t o all points in X , incrementing t h e index of t h e components at each iteration. T h e principle behind this scheme is a direct application of Result 1 above. Searching along an axis is equivalent t o maximizing in t h e direction of a unit vector or its negative. In t h e best case, as m a n y as 2m extreme points m a y b e identified using this preprocessor. Because of t h e assumption t h a t t h e n points in X are distinct, in t h e worst case two points will b e recognized as extreme points of P. 2 T h e computational complexity of this preprocessor is 0(mn) if ties do not occur a n d 0(m n) in t h e worst case. Preprocessor

2. kThis uprocedure is based on Resultu2.

k

k

u away from x : find x> such Select k a point x in kX . Determine t h e points in X farthest t h a t x' = argmaxd(x ,x'). Do this for every point in X .

63 Note t h a t an important feature of this preprocessor is t h a t , in t h e case of a tie, all points participating in t h e tie will2b e extreme points of P.

2 z-f - calculations of t h e Euclidean distance. T h e complexity of This procedure requires this algorithm is 0{n m).

Preprocessor 3. This procedure is a generalization of t h e concept behind Preprocessor 1 in t h e sense t h a t "directions of optimization" other t h a n t h e unit vectors are used. T h e u idea here is t o define t h e linear function which corresponds t o an arbitrary hyperplane and proceed t o evaluate this function at all points in X . Points t h a t yield t h e highest (lowest) functional values define a face of P and therefore always include extreme points of P. This is because t h e linear function denned by any hyperplane provides a direction of optimization. T h e process of finding points which yield t h e extreme functional values for the linear function corresponds t o maximization a n d minimization of t h e linear function on P. From Result 1, we know t h a t each such solution includes at least one extreme point. There are several ways of implementing t h e procedure systematically. In any implementation t h e number of different hyperplanes used t o define directions of optimization should b e sufficiently large although this number is arbitrary. T h e defining criteria for these hyperplanes is also arbitrary. In general, it is desirable t o generate directions of optimization t h a t are, in some sense, "uniform" a n d "dense" in t h e space where t h e points are located. T h e approach we propose here is t o use each point in X t o define a direction of optimization. This way, hyperplane coefficients are given by t h e components of t h e point. T h e advantage of this approach is t h a t it is relatively easy t o implement. Note t h a t t h e directions of optimization are defined in reference t o t h e origin. This m a y b e modified by using other reference points. Later, when ideas from this preprocessing scheme are used in another procedure, we generate directions of optimization using t h e barycenter of a set of points as a reference. Below is an algorithmic expression of this procedure: Step 0. Set k = 1.

h

Step 1. Calculate b> = {x ,xi);j

= 1,...,n.

Step 2. Identify m a x 6* and min b>. If m a x V (min b>) is unique t h e n t h e corresponding point is extreme. If two points tie, then b o t h are extreme. If three or more points tie then t h e tie breaking procedures can b e applied t o all points which tie. Step 3. Set k = k + 1. Proceed t o Step 1. Unlike Preprocessor 2, t h e occurrence of ties in this preprocessor presents a complication. If there is a tie among several points from t h e reference hyperplane, it is not immediately possible t o identify which of t h e points participating in t h e tie are extreme points of P. T h e following result will help in t h e resolution of this l complication. T THEOREM

1. Suppose that exactly T points in X, x ,...,x ,

participate

in a tie for the

farthest distance (on the same side) from a reference hyperplane H (i.e., a tie for max x T & is an extreme point of P if and only if & is an extreme I* or in min b> occurs). Then point ofU- con{x ,..., x ). Proof. If x> is an extreme point of P t h e n it is also an extreme point of U. Suppose now t h a t x> is an extreme point of U a n d it is not an extreme point of P. Note t h a t all points involved in t h e tie belong t o t h e same hyperplane which is also parallel t o H

64 a n d which supports P . Since x' is a n extreme point of U b u t not of P , it belongs t o t h e interior of t h e convex hull of two other points in P b u t not in U. A contradiction is obtained from t h e fact t h a t one of these two points must b e separated from P b y t h e supporting hyperplane. I This result indicates t h a t t h e resolution of ties is a reduced version of our original convex hull problem. T h e resolution of ties is an implementation problem. This will b e discussed in Section 6 below.

u

4. S E C O N D STAGE: R E S O L U T I O N

Preprocessors 1,2, a n d 3 are designed t o identify points in uX t h a t are extreme points of N P. U p o n termination of E t h e first stage, several points in X (as few as two and possibly more) will migrate t o X b u t t h euset X remains empty. It is at this stage t h a t t h e y status of t h e remaining points in X needs t o b e conclusively resolved. At this stage we rely on a procedure t o "build-up" a poly t o p e defined by t h e known extreme points of P by adding newly discovered extreme points. T h e scheme can b e E minor iterations. described as a procedure with major iterations each composed uof several At t h e i t h major iteration, a t least one point migrates from X t o X . In t h e embedded minor iterations we apply two auxiliary algorithms. O n e is a version of Frank-Wolfe u E algorithm for quadratic programming (see Frank a n d Wolfe, 1956) t o identify one point in X t h a t is outside conX . We t h e n apply a n idea borrowed from Preprocessor 3 E above t o identify a new extreme point of P. At eachEmajor iteration, at least one point will b e added t o t h e set X so t h a t t h e shape of con X changes a t every major iteration. W h a t follows is a general description of t h e procedure. Initialization Step. Identify t h e starting polytope as t h e convex hull of t h e extreme points u using t h e preprocessing E N discovered schemes presented in Stage 1. These points are moved from X t o t h e initial X while X begins empty. Main Substep

Step.

h

1. Select a point x e

k

u X.

E N k Substep 2. If x e conX k transfer t h e Epoint t o X , a n d go t o Substep 1; otherwise calculate t h e projection p of x on conX

a n d proceed t o Substep 3.

Substep 3. Generate t h e reference hyperplane normal

k

k kk Note tkh akt kH(a (a p )) J y

k k

a = x -p ,

E

is a supporting hyperplane of conX

k

(1)

k

at t h e point p and x e

H++(a ,(a ,p )).

u 4- Calculate level values with respect t o t h e reference hyperplane a t each point Substep in X . T h a t is, calculate t h e inner products: u

V = (a*,z>);

x*

eX .

E Substep 5. Identify maxj b>. uT h e corresponding point, if unique, is an extreme point of P . Transfer t h e point from X t o X . K ties occur between two points t h e n b o t h points are extreme. K ties occur among three or more points then resort t o a tie resolution procedure (see Section 6).

65

u Substep 6. K X

= 0, t h e n stop; otherwise r e t u r n t o Substep 1.

Two issues must b e resolved regarding this procedure. T h e first is t h e theoretical justification for t h e conclusion regarding t h e identification of new extreme points of P in u is similar t o t h e one applied in Preprocessor 3 b u t it is not Substep 5 above. This result as clear-cut since t h e set X is reduced a n d extreme points of this set a r e n o longer necessarily extreme points of P. T h e result is justified in T h e o r e m 2 below. T h e complication k Eb y this theorem. T h e generated by t h e occurrence of ties in Substep 5 is also addressed second issue relatesk t o t h e procedures for determining if x e conX a n d , if n o t , finding t h e projection of x onto t h e polytope as required in Substep 2. We have treated t h e second issue as essentially a "least-distance" problem which is solved using t h e Frank-Wolfe algorithm. In this algorithm we solve for t h e distance t o a n d projection of a point onto t h e polytope; if t h e point belongs to t h e polytope t h e algorithm returns a value of zero for t h e distance. We will present t h e details of t h e implementation of t h e Frank-Wolfe algorithm for finding t h e projection of a point onto t h e convex hull of a collection of points in t h e following section. ENU

E2. Consider u the standard partition of X into the subsets X ,X ,X u where neither X nor X are empty. lLet be the indices ofk all points in X thatE l maximize the inner product (a*,kx> )\ I = 1 , . . E . , L wherek the point x is exterior to conX , y p is x the point 1of projection of x onto conX , and a is defined in expression (1). Then con{x' ,..., x* } is a face of P. THEOREM

k

Proof: Set b = (a*,p*) a n d b* = (a*, x^).

T h e hyperplane H(a

E

b) supports con (X

N

uX)

} u k thus (a ,x>) < 6, Vx> e X uX . Certainly (a*,x>) < 6% Vx> E e X Na n d u since b < b* we x m a y conclude t hL a t t h e hyperplane H(a b*) supports con (X u X u X ) = P a t t h e 1 E N

k

points x' ,...,

x> .

I

Theorem 2 shows t h a t t h e situation where a tie has occurred in Substep 5 above is analogous t o t h e case of ties in Preprocessor 3. Ties present a serious theoretical complication in t h e implementation of Preprocessor 3 and in Substep 5 of this procedure. As we have seen from Theorems 1 a n d 2, all points involved in a tie (on a specified side of a hyperplane) lie on t h e same supporting hyperplane defining a unique face of P a n d this face contains no other points from X. Therefore, t h e extreme points of such a face are also extreme points of P. Recall t h a t if two distinct points participate in a tie t h e n b o t h points are extreme; b u t if this occurs with three or more points, then at least two m a y b e extreme. T h a t is why t h e resolution of ties reduces t o a smaller version of t h e convex hull problem. T h e issue of resolving t h e complications resulting from ties in these procedures is addressed directly in Section 6. 5. T H E F R A N K - W O L F E A L G O R I T H M F O R F I N D I N G P R O J E C T I O N S Here we present t h e details of t h e implementation of t h e Frank-Wolfe procedure t o solve t h e projection problem. Let

E

M

Xu = X =

k

u

{x\...,x } {x\...,x«)

E

k

E Consider projecting a given point x € X on t h e convex hull of X . T h e projection, p> is a point which can b e expressed as a convex combination of t h e points in X :

%

66 where t h e values of Ai,..., XM are determined by t h e solution t o t h e following optimization problem:

subject t o M

0 < A m< 1; m = l , . . . , M .

J2\m = U

m=l

k By manipulating t h e objective function we obtain a more convenient quadratic form without changing t h e problem, and after translating t h e space so t h a t x is t h e origin, the final version of t h e problem is: T mm/(A) = ±A QA subject to

2 () m=l

A m> 0; m = l , . . . , M .

k

where

Q = [«,] = (£• ' - xf(i>

- «*)

i,J = 1,..., M .

T

Note tkh a t Q m a y b e written as Q = X Xy where t h e i t h column of X is t h e vector (x* - x ). Therefore, Q is positive semidefinite. Problem (2) is a quadratic knapsack problem. Finally, recall t h a t given a direction of decrease d in a n d a starting point A', t h e line search problem mina>0 /(A' + ad) may b e solved analytically when t h e function is quadratic. T h e optimal step size is

An adaptation of t h e Frank-Wolfe algorithm exploiting t h e special structure and properties of Problem (2) is presented next. Step 0. Find a feasible interior starting solution. T h e structure of t h e problem makes this easy since we need only t o find values for Ai,..., Aju- such t h a t M

£A m=l

m = l;

0 < A m< 1; m = l , . . . , M .

For instance, use t h e point: Ai = • • • =M \= ^

67 Given a current feasible point A* solve the

Step 1. Find a direction of improvement. following linear program:

T

V / ( A * ) • A = ( Q A * )A

min

EA

m= l

(4)

m=l

A>0.

k

Let this optimal solution b e A*. T h e direction of improvement is d = A* - A*.

k Step 2. Calculate the step size in the direction d . T h e optimal step size is used unless this takes us beyond t h e feasible region. This defines our choice of step size: (apply result (3))

k k • l d QX k k\ a = mm y1, - . „ V

d Qd )

+1 Step 3. Update and iterate. T h e new iterate becomes A * = A* + If t h e difference between t h e two points is small enough, stop. Otherwise increment counter k a n d go to Step 1.

c Observe t h a t solving linear program (4) is immediatef since it has a solitary row. Simply look at t h e coefficients of t h e gradient, V / ( A * ) = QA , a n d find t h e most negative. Ties can b e broken arbitrarily. Setting t h e corresponding variable t o 1 and t h e others to 0 is an optimal solution. Since t h e i t h component of V / ( A * ) is t h e reduced cost q it m a y be iy evaluated as follows: M

ft =

m=l

#mA

m

m=l

k

k respect T h e Frank-Wolfe algorithm is used here to find t h e location of a point x with to a polytope defined as t h e convex hull of its extreme points. If t h e point x belongs to t h e polytope, t h e n t h e algorithm finds a convex combination of extreme points t h a t define t h e point and t h e resultant distance between t h e point and t h e polytope is zero. Ifk t h e point is exterior, t h e nk t h e algorithm provides t h e point of t h e polytope closest to x y which is t h e projection p . Note t h a t t h e size of problems (2) a n d (4) grows after every major iteration. TE h e number of variables in these problems is t h e number of points in the current set X . 6. I M P L E M E N T A T I O N AND C O M P U T A T I O N A L RESULTS T h e ideas presented above were tested in a computer code. There are several implementation issues t h a t merit special attention. Recall t h a t problematic ties can occur initially and subsequently in t h e implementation of Preprocessor 3 a n d in Substep 5 of t h e Second Stage. T h e occurrence of ties may generate

68 a sequence of progressively smaller nested convex hull problems. In t h e implementation of t h e ideas presented here, t h e issue comes up of how to resolve t h e problem of ties. T h e proposed approach is based on t h e premise t h a t it is sufficient to identify some of the extreme points of P when these are part of a list of points involved in a tie. This can b e accomplished by applying only preprocessing procedures to t h e first level in the nesting. T h e principal justification for this limited effort strategy is to reduce t h e total computational requirements of t h e scheme. W h e n a tie occurs during t h e application of Preprocessor 3, our implementation is designed to apply Preprocessors 1, 2, and a modified version of Preprocessor 3 in this order k of changing t h e reference point and just once. T h e modification of Preprocessor 3 consists from t h e origin to t h e barycenter. Thus, each point x in the original description of t h e procedure is changed to

T 1 ;'=i

1

T

where r e , . . . ,x are t h e points involved in t h e tie. This modification is m a d e to provide a fresh set of directions of optimization in t h e identification of extreme points. Although t h e objective of t h e second stage is to identify conclusively t h e status of all (remaining) points, when a tie occurs in Substep 5 of this stage, a n exhaustive search E for these is not warranted. In our implementation we were satisfied t o introduce one E to t h e next. T h e introduction of a new element t o t h e list X from one major iteration extreme point generates a new polytope conX . A subsequent change in t h e geometry of intermediate polytopes may help resolve more naturally t h e status of some points involved in previous ties. For this reason, when resolving ties in t h e Second Stage, our search was limited to t h e application of Preprocessor 1 to the points involved in t h e tie. T h e computer implementation was applied to d a t a extracted from t h e coefficient matrices for selected linear programs. T h e coefficient matrix from any linear p r o g r a m can b e used as a source of d a t a for t h e set X to test t h e schemes. These d a t a are convenient since t h e m matrix of a linear program is a n a t u r a l way of obtaining a collection of ra by n coefficient n points in 3ft . T h e procedures developed here applied to a coefficient m a t r i x of a linear program will identify nonextraneous variables (Dula, 1990). Other sources of d a t a for the purposes of testing our scheme includes actual examples in computational geometry and statistics, or they m a y be generated randomly. T h e linear programs originate from " N E T L I B , " a standard, publicly available collection of test problems (see Gay, 1985). T h e problems selected were: " B O R E 3 D " (233 rows by 315 columns), "STAIR" (356 by 284), " B E A C O N F D " (173 by 262), "BRANDY" (220 by 249), "ISRAEL" (174 by 142), " C A P R I " (271 by 313), "E226" (223 by 2 8 2 ) , " S H A R E 1 " (117 by 225), and "BANDM" (305 by 454). Table 1 summarizes t h e test results for t h e nine problems in t h e above order. Note that in t h e best case all extreme points were identified in t h e first stage using preprocessors alone (problem "STAIR" with 356 rows and 284 columns). T h e worst performance in t h e preprocessing stage occurs with problem " S H A R E 1 " (117 by 225) with a proportion of 56% of t h e extreme points identified, leaving a substantial load for t h e second stage which is t h e most expensive computationally.

69 Table 1. Performance of preprocessors and specialized Frank-Wolfe Scheme on nine problems. Rows

Cols

m

n

(dim) 233 356 173 220 174 271 223 117 305

(pts) 315 284 262 249 142 313 282 225 454

Proc. 1 246 268 179 173 173 201 177 93 303

Proc. 2 (2) (6) (2) (3) (2) (3) (3) (12) (6)

Stage 1 Processor no ties tie-1 14 17 2 5 0 23 14 12 0 0 0 21 4 57 2 23 25 50

3 tie-2 5 0 0 0 0 0 0 0 0

tie-3 0 8 14 5 14 0 13 8 11

Stage 2 FrankWolfe 11 0 44 36 14 41 28 98 63

Total Extr. Points 293 283 260 240 142 263 279 224 452

T h e entries in t h e columns: "Proc. 1," "Proc. 2," all columns under t h e general heading of "Preprocessor 3," a n d "Frank-Wolfe" represent t h e number of extreme points found using t h e respective scheme. T h e headings "tie-1," "tie-2," and "tie-3" indicate t h e preprocessors used when ties among three or more points h a d to b e resolved; hence t h e entries under "tie-1," are t h e number of extreme points discovered using Preprocessor 1 when a tie occurred in t h e application of Preprocessor 3. Note t h a t entries in t h e column labeled "no ties" refer t o t h e number of times it was not necessary t o resort to t h e tie resolution contingency when Preprocessor 3 was executed, which includes t h e possibility t h a t a tie occurred between only two points, implying t h a t b o t h must b e extreme. Finally, t h e entries in t h e column labeled "Proc. 2" are in parentheses because these numbers represent extreme points identified b u t which were previously discovered by Preprocessor 1. In no case were any new extreme points discovered using Preprocessor 2 after Preprocessor 1. Our experience has shown t h a t performance is adversely affected by sparsity. Sparseness creates t h e opportunity for ties especially in Preprocessor 3 and in t h e analogous procedure in Substep 5 of Stage 2. A multitude of ties can occur due to inner products t h a t result in zeroes. 7. CONCLUSIONS We have treated t h e version of t h e convex hull problem where extreme points are enumerated b u t no facial decomposition is required. T h e solution approach to solve this convex hull problem is original since it is not based on any of t h e complicated procedures from computational geometry which yield additional information on t h e facial lattice of t h e convex hull nor does it use techniques from Operations Research relying entirely on t h e solution of linear programs. T h e resulting deterministic two-stage m e t h o d appears to b e effective, especially in preprocessing t h e points to identify most of t h e extreme points with a m i n i m u m computational requirement. In closing we point out t h a t there remain several issues which merit further study. T h e preprocessing phase is clearly effective although Preprocessor 2 h a d a disappointing performance in t h e test problems used. We conjecture t h a t for certain more uniform shapes of t h e polytope, this approach may b e more useful. It may b e t h a t it should only be used in tie resolutions. T h e use of t h e Frank-Wolfe algorithm should also b e studied. We have used this algorithm t o perform two tasks: (1) determining if a point belongs to an intermediate convex hull, and (2) finding t h e projection of an exterior point on t h a t convex hull. T h e first task m a y b e performed more efficiently using other algorithms especially since it is well known t h a t t h e Frank-Wolfe algorithm is adversely affected in performance (slow convergence) and in accuracy when t h e point is interior to or on t h e boundary of t h e convex hull. Thus, to discover if a point belongs to a convex hull it may

70 b e more efficient to solve a feasibility problem in t h e form of a linear p r o g r a m (see for instance Demyanov a n d Malozemov, 1974, Appendix IV.) REFERENCES Demyanov, V . F . and V.N. Malozemov, (1974), Introduction Sons, New York.

to Minimax,

J o h n Wiley k

Dula, J.H., (1990), Geometry of polyhedral cones and optimal value functions with applications to redundancy in linear programming, Technical Report 90-CSE-33, Department of Computer Science and Engineering, Southern Methodist University, Dallas, T X , 75275. Frank, M. and P. Wolfe, (1956), An Algorithm for Quadratic Programming, Naval search Logistics Quarterly, 3, 95-110. Gastwirth, J., (1966), On robust procedures, J. Amer.

Stat.

Assn.,

Re-

65, 929-948.

G r a h a m , R. and F . Yao, (1990), A whirlwind tour of computational geometry, The ican Mathematical Monthly, 97, 687-701. Gay, D. (1985), Electronic mail distribution of linear test problems, COAL 13, 10-12.

Amer-

Newsletter,

Karwan, M.H., V. Lofti, J. Telgen and S. Zionts, (1983), Redundancy in mathematical programming: a state-of-the-art survey, in Lecture Notes in Economics and Mathematical Systems, 206, M. Beckmann and W . Krelle, eds., Springer-Verlag, Berlin. Mattheiss, T.H. and D.S. Rubin, (1980), A survey and comparison of m e t h o d s for finding all vertices of convex polyhedral sets, Mathematics of Operations Research, 5, 167-185. P r e p a r a t a F.P. and M.I. Shamos, (1985), Computational geometry: an introduction, Springer-Verlag, New York. Seidel, R., (1986), Constructing higher-dimensional convex hulls at logarithmic cost per face, in Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing, Berkeley, CA May 28-30, 1986, 404-413. Wallace, S.W. and R . J . B . Wets, (1990), Preprocessing in stochastic programming: the case of linear programs, Haugesund Maritime College, Working Paper 9 0 / 3 , Haugesund, Norway. Wets, R . J . B . and C. Witzgall, (1967), Algorithms for frames and linearity spaces of cones, Journal of Research of the National Bureau of Standards B - Mathematics and Mathematical Physics, 71B, p p . 1-7.

Programming NEW

b

^ntcrinr Point

^R.CH:

egfft&S

This page intentionally left blank

ADAPTING THE INTERIOR POINT METHOD FOR THE SOLUTION OF LINEAR PROGRAMS ON HIGH PERFORMANCE COMPUTERS J. A N D E R S E N , R. L E V K O V I T Z and G. M I T R A D e p a r t m e n t of M a t h e m a t i c s & Statistics, B R U N E L - T h e University of West L o n d o n , Uxbridge, Middlesex U B 8 3 P H , U.K.

ABSTRACT

In this p a p e r we describe a unified algorithmic framework for t h e interior point m e t h o d ( I P M ) of solving L i n e a r P r o g r a m s (LPs) which allows us to a d a p t it over a r a n g e of high p e r f o r m a n c e c o m p u t e r architectures. W e set o u t t h e reasons as to why I P M m a k e s b e t t e r use of high p e r f o r m a n c e c o m p u t e r architecture than t h e sparse simplex m e t h o d . In t h e inner iteration of t h e I P M a search direction is c o m p u t e d using N e w t o n or higher o r d e r m e t h o d s . Computationally this involves solving a sparse symmetric positive definite (SSPD) system of equations. T h e choice of direct and indirect m e t h o d s for t h e solution of this system and t h e design of data structures to take advantage of coarse grain parallel a n d massively parallel c o m p u t e r architectures a r e considered in detail. Finally, w e p r e s e n t experimental results of solving N E T L I B test problems on examples of these architectures and p u t forward a r g u m e n t s as t o why integration of t h e system within sparse simplex is beneficial. KEY WORDS Linear Programming, Parallel Algorithms, Interior Point M e t h o d , T r a n s p u t e r s , Massively Parallel C o m p u t e r s , Choleski Factorization, Symmetric Positive Definitive Systems.

HARDWARE PLATFORMS FOR THE SPARSE SIMPLEX AND THE INTERIOR POINT METHOD Linear p r o g r a m m i n g on parallel c o m p u t e r s Progress in t h e solution of large LPs has been achieved in t h r e e ways, namely hardware, software a n d algorithmic developments. M o s t of t h e developments during t h e 70's and early 80's in t h e Sparse Simplex m e t h o d were based on serial c o m p u t e r architecture. 73

74 T h e m a i n thrust of these developments were towards exploiting sparsity and finding m e t h o d s which either reduced simplex iterations or r e d u c e d t h e c o m p u t a t i o n a l work in each iteration (Bixby 1991; Mitra et al, 1991). In general these algorithmic a n d software d e v e l o p m e n t s of t h e sparse simplex m e t h o d c a n n o t b e readily extended t o parallel c o m p u t e r s . In contrast t h e interior point m e t h o d s which have proven t o b e robust and competitive a p p e a r to b e better positioned to m a k e u s e of newly emerging high p e r f o r m a n c e c o m p u t e r architecture. T h e relative advantages of using I P M over sparse simplex in exploiting these architectures a r e summarised below. A few researchers ( F o r r e s t et al, 1990; P a r d a l o s et al, 1990) have identified difficulties involved in adapting sparse simplex algorithms for parallel c o m p u t e r s . Although a n u m b e r of i m p l e m e n t a t i o n s have b e e n r e p o r t e d (Stunkel et al, 1988; C h e n et al, 1990) t h e only credible and robust implementation is d u e to F o r r e s t and Tomlin ( F o r r e s t et al, 1990). O u r profiling information Fig 1.1 and 1.2 for some well known test p r o b l e m s from t h e Netlib collection show that t h e main computational work is spread over a n u m b e r of algorithmic sub-components such as P R I C E , B T R A N , F T R A N etc.. T h e relative computational efforts in these p r o c e d u r e s vary from m o d e l t o m o d e l . T h r o u g h s o m e ingenuity and data reorganisation t h e P R I C E p r o c e d u r e h a s b e e n redesigned for parallelism (Forrest et al, 1990) and shows good speed u p . T h e speed u p in t h e o t h e r algorithmic p r o c e d u r e s a r e n o t of t h e s a m e order. If w e t a k e into account A M D A H L ' S law ( A m d a h l 1967) t h e n we can appreciate h o w t h e significant c o m p u t a t i o n a l effort of t h e serial p a r t of t h e logic imposes a fairly m o d e s t limit on t h e scope of speed u p . Essentially we c a n n o t easily a d a p t t h e simplex m e t h o d for parallel c o m p u t a t i o n because it is difficult to r e p r e s e n t indirect list structures on S I M D machines for sparse matrices and p r o b l e m s of load-balancing on M I M D machines. W h e r e a s in serial m a c h i n e s this representation reduces total n u m b e r of operations, in parallel m a c h i n e s it markedly slows down processing. Even h a r d w a r e scatter a n d g a t h e r instructions d o n o t fully c o p e with t h e problem of representing sparse d a t a o n parallel machines. Parallel m a c h i n e architectures in general a r e well suited for d e n s e matrix and vector processing. T h e Interior Point M e t h o d on parallel c o m p u t e r s All variants of I P M share the s a m e computational characteristics: t h e n u m b e r of iterations is usually very low, typically less than 100, and t h e algorithmic steps r e q u i r e a r e p e a t e d construction and factorization of a Sparse Symmetric Positive Definite (SSPD) system of equations with a fixed non-zero structure. O u r profiling information Fig 1.3 clearly illustrates t h a t m o s t of t h e computational work takes place in t h e construction of an S S P D matrix and the solution of t h e resulting system by a direct m e t h o d such as Choleski factorization or an indirect m e t h o d such as conjugate gradient. This concentration of computational effort m a k e s I P M well suited for exploiting parallel algorithmic paradigms. T h e specialists in sparse matrix computation have s h a r p e n e d t h e c o m p u t a t i o n a l m e t h o d s for solving S S P D systems on parallel c o m p u t e r s (Duff et al, 1986; G e o r g e et al, 1981; Aschroft et al, 1987) and this has also added to t h e advantage of adapting I P M on parallel machines. F o r instance, t h e use of elimination trees, identification of s u p e r n o d e s and loop unrolling for vector (parallel) machines a r e well established and

75 well u n d e r s t o o d (Liu 1989; Lustig et ai, 1991). It is therefore n o coincidence t h a t high performance I P M optimization systems incorporate software design which exploit their respective h a r d w a r e platforms. W e n o t e that t h e K O R B X system is designed especially for t h e Alliant 8 processor parallel computer, and that I B M ' s O S L is specially designed for t h e RS6000, a n d 3090 computers; even O B I , otherwise a portable system, is specially tuned for t h e Cray Y M P . O u r research interests on t h e other hand lie in adapting I P M for a r a n g e of parallel computing architectures a n d finding efficient ways of integrating these algorithms with o u r simplex solver. F o r o u r h a r d w a r e platforms, we have chosen t h e t r a n s p u t e r based Distributed M e m o r y C o m p u t e r ( D M C ) a n d an array processor ( A M T - D A P ) . In this report, we focus on t h e adaptation of t h e S S P D solver to these h a r d w a r e platforms. T h e rest of the p a p e r is set o u t as follows: in t h e next section we describe t h e I P M algorithm, in the following two sections w e discuss t h e D M C and t h e D A P implementations with t h e corresponding experimental results. In t h e penultimate, we analyze t h e computational results and consider t h e cross-over to simplex strategy. VAX Performance and Coverage Analyzer Program Counter Sampling Data (37846 data points total) Bucket Name PROGRAM_ADDRESS\ CHOLSK CRADAT FCSYM PRDUSB NFCS BKSUB EFFAC FFSUB RDIV RCHSET MTXVC TMTXVC CALCAP IDS MRGIND MNDG CALCRO MLTVEC CALCD CALCDZ VECRP CHKSOL CALCDX CALCDW CALCDY CALCS ADSVEC BLDG BASREC INTVAL NOTICE SUMVEC MAXVEC DUMPSYM . . . . IPM IPMERR MINMAIN . . . . MLTSYM PRTSOL RPER SHARE$DBGSSISHR SHARE$FORRTL . . SHARE$LBRSHR . . SHARE$LIBRTL . . SHARESMTHRTL . . SHARE$PCA$COLLECTOR SHARE$SCRSHR . . SHARE$SMGSHR . . SYSTEM$SERVICE . SYSTEM$SPACE . . VC1NRM

PROBLEM

GANGES

CHOLSK

54.0 %

CRADAT

17.1 %

FCSYM

Fig. 1.1

7.0 %

54.0% 17.1% 7.0% 5.0% 4.9% 1.5% 1.4% 1.4% 1.3% 1.1% 1.0% 0.8% 0.6% 0.4% 0.3% 0.3% 0.2% 0.2% 0.2% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0* 0.01 0.0%

VAX P e r f o r m a n c e and C o v e r a g e A n a l y z e r Program C o u n t e r S a m p l i n g D a t a B u c k e t Name PROGRAM_ADDRES S \ PRICE

(87800 d a t a p o i n t s t o t a l )

***** ***** ************************ ***** *********************** ***** ******************* ***** ******** ***** ******* ***** ****** ***** ***** ***** ****

P R O B L E M : 2 5 fv 4 7 PRICE...

22.1 %

NBTRAN

13.3 %

MRKWTZ

12.9 %

FTRAN

11.2%

BTRAN

6.0 %

Fig 1.2

-

77 VAX P e r f o r m a n c e and C o v e r a g e A n a l y z e r Program C o u n t e r S a m p l i n g D a t a

(106329 data p o i n t s t o t a l )

-

"*"

B u c k e t Name + + + + + + + + + +—+ PROGRAM_ADDRESS\ CHOLSK ************************************************ CRADAT ************** FCSYM ** NFCS ** PRDUSB ** BKSUB * FFSUB * RDIV * RCHSET * MTXVC * TMTXVC * EFFAC IDS CALCAP MRGIND CALCRO CALCDZ MLTVEC CALCD MNDG CHKSOL PROBLEM : 2 5 fv 4 7 CALCDX VECRP ADSVEC CALCDY CALCS CHOLSK 65.2 % CALCDW BLDG BASREC CRADAT 18.8 % NOTICE INTVAL SUMVEC MAXVEC FCSYM 2.9 % DUMPSYM IPM IPMERR MINMAIN MLTSYM PRTSOL RPER . . . . . . . SHARE$DBGSSISHR . SHARE$FORRTL . . . SHARE$LBRSHR . . . SHARESLIBRTL . . . SHARE$MTHRTL . . . SHARE$PCA$COLLECTOR SHARE$SCRSHR . . . SHARE$SMGSHR . . . SYSTEM$SERVICE-. . SYSTEM$SPACE . . . VC1NRM 25FV47

Fig

1.3

65.2% 18.8% 2.9% 2.1% 2.0% 1.2% 1.1% 1.0% 0.9% 0.9% 0.8% 0.6% 0.4% 0.4% 0.3% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0%

78 CHOICE OF INTERIOR POINT METHOD Primal dual barrier Among the primal-dual primal-dual 1989). This Primal:

method

various I P M s t h a t w e r e suggested and i m p l e m e n t e d recently, t h e g r o u p of type algorithms have emerged as m o s t promising. T h e framework for t h e p a t h following I P M was introduced by M e g i d d o in 1986 ( M o n t e i r o et al, algorithm solves t h e following primal and dual p r o b l e m s simultaneously.

T

T

T

min c x Dual: Max b y mXn m n s.t. A y + z = c, z > 0 s.t. Ax = b,x > 0 AeR ,b y€R c z X€R

(3.1)

> t >F

l / 2

T h e primal-dual algorithm converges to t h e optimal solution in at m o s t 0 ( n L ) iterations ( M o n t e i r o et al, 1989) w h e r e n d e n o t e s t h e dimension of t h e p r o b l e m s and L t h e input size. It c o m p u t e s both primal and dual i n t e r m e d i a t e solutions a t any stage; this ensures t h a t t h e retrieval of an optimum extreme point from t h e optimal primal and dual solutions can b e d o n e in strongly polynomial time ( M e g i d d o 1991). T h r e e variants of t h e primal-dual algorithm w e r e implemented namely, t h e primal-dual affine ( M o n t e i r o et ai, 1989) primal dual barrier (Lustig et al., 1990) and recently t h e primal dual power series algorithm (predictor corrector) (Lustig et al, 1990; Bixby et al., 1991). All t h r e e variants solve t h e L P p r o b l e m s by minimizing t h e complementarity gap (optimization step), b u t while t h e affine algorithm c o m p u t e s an optimizing step only, t h e barrier m e t h o d calculates a combined optimizing and centralizing step which also keeps t h e solution away from t h e boundaries. T h e power series algorithm c o m p u t e s an optimizing step as in t h e affine algorithm (predictor step) and then the centralizing steps (correcting steps). In algorithm 3.1 we p r e s e n t a p s e u d o c o d e of the primal dual barrier algorithm. Although t h e predictor corrector algorithm performs b e t t e r t h a n t h e o t h e r two variants, all primal dual algorithms are computationally d o m i n a t e d by t h e calculation of t h e affine trajectory in which a system involving a new S S P D matrix M is created and solved (step P D 4 ) . In t h e s u b s e q u e n t sections we discuss t h e implementation of this step first on the D M C and t h e n on t h e D A P . Algorithm 3.1 PDl.

Construct the phase I extended problems.

PDl.

Let X be a diagonal matrix of x, Z be a diagonal matrix of z, set D = XZ \

PD3.

Let p(jx) be a compound {centralising and advancing) function, it the centralising parameter.

PD4.

Find the new search direction 1 for y : y compute : M -

ADA l

compute : y =

M~ ADp(p)

Find initial solution for x> y, z.

use y to compute the search direction for x,z : x,z. PD5.

Make a step in the computed direction

PD6.

If end conditions are met, stop.

xty,z j}

;

(4.2)

A row r is a root if no such i exists (hence r cannot have a parent)

}

}

T h e elimination t r e e can b e interpreted as a communication tree for t h e rows of the matrix. All communication during the C D P factorization is d o n e strictly through t h e b r a n c h e s of t h e elimination tree. W e use t h e elimination t r e e t o m a p row subsets of t h e matrix to t h e binary tree transputer grid. This m a p p i n g is achieved by a simple visiting heuristic which travels through t h e elimination t r e e in a t o p to b o t t o m fashion and identifies t h e branches w h e r e t h e elimination workload can b e divided into roughly equal p a r t s (step C D P 4 ) . Finally, t h e algorithm determines t h e life span of each row (with th respect to t h e partitioning). T h e life span of a row is defined below: L e t r d e n o t e t h e s row of t h e o r d e r e d matrix M. w e define t h e H o m e Processor H P ( r ) s s and t h e E n d Processor E P ( r ) respectively as :

s

Home Processor : HP(r ) = P , r e fl,, K, is allocated to P, (see CDP4)

s

End Processor : EP(r ) = P , where j = min{l\

f

(4.3)

t s

i

re R

q

lt

u^

q*

0, s0 is gradually

84 enforced, h e n c e t h e corresponding elements of D can take very large or very small values. This increases t h e condition n u m b e r for t h e S S P D matrix M t h u s creating numerical p r o b l e m s for t h e C G m e t h o d . (Fig. 5.1)

(Fig 5.2)

DISCUSSION AND CONCLUSIONS O u r tests show t h a t parallel implementation on t h e D M C is stable, b u t an effective speed u p can b e achieved only on S S P D matrices t h a t have wide and balanced elimination trees. Different reorderings of t h e S S P D matrix and balancing techniques used for t h e elimination t r e e can improve t h e performance substantially. T h e D A P implementation is especially relevant for S S P D matrices whose Choleski factor is very dense. T h e C G numerical problems experienced in t h e final iterations of I P M can b e largely avoided; our experiments in cross-over to simplex indicate t h a t t h e best results w e r e achieved by terminating I P M prior to reaching t h e optimal solution (Mitra et al, 1991). Also, flagging and removing variables converging t o z e r o can improve t h e conditioning of t h e D matrix and in turn increase t h e stability of t h e C G solver.

85 ACKNOWLEDGEMENTS T h e research r e p o r t e d in this p a p e r has b e e n partly s u p p o r t e d by Digital E q u i p m e n t Corporations, E u r o p e a n External Research P r o g r a m . W e also t h a n k P A R S Y T E C ( G e r m a n y ) G m b H for their interest in o u r research and for their software support. Professor D e n i s Parkinson of A M T Ltd has worked closely with u s and w e have benefitted greatly from his advice regarding t h e D A P implementation. T h e s u p p o r t of t h e U K Science and Engineering R e s e a r c h Council ( S E R C ) is also gratefully acknowledged, w h o together with A M T Ltd have supported Mr. J. A n d e r s e n ' s C A S E studentship. REFERENCES A n d e r s e n J, Levkovitz R, M i t r a G and T a m i z M (1990). Adapting I P M for t h e solution of LPs on Serial, C o a r s e Grain Parallel and Massively Parallel C o m p u t e r s , Brunei University, D e p a r t m e n t of M a t h e m a t i c s and Statistics TR01/90. A m d a h l G . M . (1967). Validity of t h e Single Processor A p p r o a c h t o Achieving L a r g e Scale C o m p u t i n g Capabilities, AFIPS Conference Proceedings, Vol. 20, 483-485. A n d e r s e n J, M i t r a G and Parkinson D (1991). T h e Scheduling of Sparse Matrix-Vector Multiplication on a Massively Parallel D A P C o m p u t e r , Brunei University, D e p a r t m e n t of M a t h e m a t i c s TR09/91 presented to I C I A M Congress Washington, 9 1 , to a p p e a r in Parallel C o m p u t i n g . Ashcroft C.C, G r i m e s R . G , Lewis J.G, et al. (1987). Progress in Sparse Matrix M e t h o d s for L a r g e Linear Systems on Vector Supercomputers, International Journal of Supercomputer Applications, Vol L 10-30. Bixby R . E (1991). T h e Simplex M e t h o d - It K e e p s Getting Better, P r e s e n t e d t o t h e 14th International M P S Symposium, Holland. Bixby R . E , Gregory J.W, Lustig LJ, Marsten R . E , S h a n n o D.F.(1991). Very L a r g e Scale Linear P r o g r a m m i n g : A Case Study in Combining Interior Point and Simplex M e t h o d s , D e p a r t m e n t of Mathematical Science, Rice Unfversity,Texas. C h e n G . H , Lin H . F , Sheu J.P. (1990). D a t a M a p p i n g of Linear P r o g r a m m i n g on Fixed Size Hypercubes, Parallel Computing, Vol. 13. 235-243. Duff I.S, Erisman A.M, Reid J.K. (1986). Direct Methods for Sparse Matrices, Oxford University Press. Forrest J.J.H, Tomlin J.A. (1990). Vector Processing in Simplex and Interior M e t h o d s for Linear Programming, Annals of OR, Vol. 22, 71-100. Gay D . M . (1985). Electronic Mail Distribution of Linear P r o g r a m m i n g T e s t Problems, COAL Newsletter MP Society, Vol 13. 10-12, 1985. G e o r g e J.A, Liu J.W. (1981). Computer Solution of Large Sparse Positive Definite Systems, Prentice Hall. G o l u b G . H , V a n - L o a n C.F. (1983). Matrix Computation, N o r t h Oxford Academic. K a r m a r k a r N . (1984). A N e w Polynomial T i m e Algorithm for Linear Programming, Combinatorica, Vol 4, 373-379. Lai C.H, Liddell H . M . (1988). Preconditioned Conjugate G r a d i e n t M e t h o d s on t h e D A P , Proceedings of The Mathematics Of Finite Elements & Applications, Vol 4. 147-156. Liu W . H . (1989). R e o r d e r i n g Sparse Matrices for Parallel Elimination, Parallel Computing, Vol 11. 73-91. Lustig LJ, M a r s t e n E.R, S h a n n o D . F . (1990). O n Implementing M e h r o t r a ' s Predictor-

86 C o r r e c t o r Interior Point M e t h o d for Linear Programming, Technical R e p o r t S O R 9003, D e p a r t m e n t of Civil Engineering and Operational Research, Princeton University. Lustig I.J, Marsten R . E , S h a n n o D.F. (1991). T h e Interaction of Algorithms and Architectures for Interior Point M e t h o d s , Research R e p o r t s , R U T C O R M e g i d d o N . (1991). O n Finding Primal-Dual and Dual-Optimal Bases. ORSA Journal on Computing, Vol 2. Mitra G, T a m i z M. (1991). Alternative M e t h o d s for Representing t h e Inverse of Linear P r o g r a m m i n g Basis Matrices, in, Recent Developments in Mathematical Programming, A S O R special issue, Edited by S. Kumar, G o r d o n - B r e a c h . M o n t e i r o D.C, Adler I. (1989). Interior Path Following Primal-Dual Algorithm, Mathematical Programming, Vol 44. Mitra G, Levkovitz R, T a m i z M. (1991). Integration of I P M Within Simplex, Experim e n t s in Feasible Basis Recovery, Brunei University, P r e s e n t e d to 14th M P S Symposium Holland. Pardalos P.M, Phillips A T , R o s e n J.B. (1990). Topics in Parallel C o m p u t i n g in Mathematical Programming, R e p o r t CS-90-22, D e p a r t m e n t of C o m p u t e r Science, Pennsylvania State University. Also in S I A M frontiers in Applied Mathematics. Stunkel C.B, R e e d D.A. (1988). Hypercube Implementation of t h e Simplex Algorithm, Proceedings on Hypercube Concurrent Computers and Applications, I A A C M Publication, 1473-1482.

IMPLEMENTATION OF AN INTERIOR POINT LP ALGORITHM ON A SHARED-MEMORY VECTOR MULTIPROCESSOR

MATTHEW J. SALTZMAN Department of Mathematical Sciences, Clemson University Clemson SC 29634-1907, USA Telephone: 803/656-3434

E-mail: [email protected]

ABSTRACT One attractive aspect of interior point algorithms for linear programming is their suitability for implementation on multiprocessing computers. This paper describes a number of issues relating to the implementation of these methods on shared-memory vector multiprocessors. Of particular concern in any interior point algorithm is the factorization of a sparse, symmetric, positive definite matrix. This implementation exploits the special structure of such matrices to enhance vector and parallel performance. In initial computational tests, a speedup of up to two was achieved on three processors. KEYWORDS Linear programming; interior point algorithms; Cholesky factorization; sparse matrices; parallel matrix factorization; linear algebra. INTRODUCTION Since the introduction in the mid-1980s of interior point algorithms for linear programming, significant advances in both simplex and interior point algorithms have dramatically increased the size of problems that can be solved in reasonable time. These advances have occurred in both algorithm design and in exploitation of advanced features of current supercomputers, such as vector processing, for example, see (Adler et al, 1989b; Marsten et al, 1989; McShane et al, 1989; Lustig et al, 1989), and others. Interior point methods now promise to outperform the simplex method on large instances of certain classes of problems, and on many problems previously considered particularly difficult. One feature of interior point methods is that they appear to be more amenable than the simplex method to implementation on parallel computers. In this paper, we investigate opportunities for exploiting advanced-architecture computers in the implementation of interior point algorithms. Such opportunities are concentrated in the construction, factorization and solution of a sparse, positive definite (SPD) system of equations. This is the most computationally intensive part of each iteration: in large or dense problems, 80-90% or more of the computation time is spent in this step (on serial machines). This paper describes implementations of vector and parallel algorithms for factoring large, 87

88 SPD matrices, and shows how these methods can be integrated into interior point LP algorithms. We describe t h e results of computational tests, and indicate where further research is likely to show promise. We begin by briefly reviewing a primal-dual interior point LP algorithm. T h e following section reviews algorithms for solving sparse, S P D linear systems. We discuss t h e perform a n c e characteristics of various forms of t h e Cholesky factorization algorithm for SPD matrices and their suitability for sparse matrices and vector processing. Next we describe t h e d a t a structures to support t h e implementation of t h e solver and exploitation of vector and parallel processing. Finally, we describe our experience implementing t h e algorithm on an IBM 3090 vector multiprocessor. A PRIMAL-DUAL LP A L G O R I T H M

1 T h e implementation described here is an extension to the FORTRAN-77 interior point software package, O B I (Optimization with Barriers 1). T h e particular form of interior point algorithm implemented in O B I is t h e primal-dual barrier algorithm, described in (Lustig et al, 1989). In this section, we briefly recap this algorithm. T

T h e L P to be solved is:

mm{c x : Ax = 6, 0 < x < u}, (1) where A is m x n. We assume for t h e purpose of exposition that all of t h e upper bounds u are finite, although this assumption is not restrictive. T h e inequalities x < u in (1) can be replaced by x + s = u, s > 0. T h e inequalities x > 0 and s > 0 can then be eliminated by incorporating t h e m into a logarithmic barrier function. T h e equality constraints can be relaxed, and their residual vectors incorporated into the objective with Lagrange multipliers. T h e Lagrangian barrier function is:

T L(x, s,y,w\p)

n

T

n

= c x — p^Tlnxj i=i

— p ^ In Sj — y (Ax i=i

T — b) — w (x

+ s — u).

(2)

T h u s , L represents a family of functions parameterized by p. L has a zero-valued derivative when Ax

T

X + 5

= =

b, u,

(3) (4)

=

c,

(5)

XZe

=

pe,

(6)

SWe

=

pe,

(7)

A y-w-\-z

where X, S, Z and W are diagonal matrices with entries Xj, Sj, Zj and Wj, respectively, and z is a vector of dual slack variables (defined by (5)). It is apparent t h a t as p —> 0, (6) and (7) approach t h e complementary slackness conditions for a pair of optimal solutions to t h e primal and dual LPs. Given x > 0, s > 0 , w > 0, z > 0 and y, and for a fixed value of p, we can apply a damped Newton m e t h o d to (3)-(7). T h e Newton steps for each of the variables then satisfy:

T

= =

b-Ax, T 0,

x s y + d z- dw = c- A y - z + w, Zd + Xd x z = pe — XZe, Wd + Sd s w = pe-SWe.

Ad

1

Ad x d +d

OBI is a trademark of XMP, Inc.

(8) (9) (10) (11) (12)

89 An iteration of t h e algorithm consists of solving (8)-(12) in t h e following steps:

T1

T )- (Ae(p(fi)-dD) dy = (AOA G(Ald -p(fi)

=

d

x z = d z = — d ls 1

fiX~xe

d

(where 0 = (S^W tests

y

fiS~ e

—z — —w —

d,

x

S~ Wd ,

1

and p(fi) = ^(S'

+ X~ Z)-

+ d P) ,

xdD ), l X~ Zd , x

+

x x + z), performing t h e ratio

- X~ )e-w

=

min{-xj/(d )j

: (d )j

s

=

mm{-Sj/(d )j

: (d )j

s

< 0},

a

z

=

njn{-Zj/(d )j

: (d )j

< 0},

a

w

=

mjn{—Wj/(d )j

a

x

a

B

3

(n)

2

where A(

\ _ J ™ ~~ \

nyjn

if n < 5000 if n > 5000

This results in a substantial reduction in fi at each iteration, and t h e sequence of solutions converges t o a primal and dual o p t i m u m . T h e algorithm terminates when t h e duality gap is sufficiently small, namely

T

T

cx

—T by +

T Tu u

1 + \b y — u w\

< 10"

For details of t h e algorithm (including selection of the initial solution and initial fi) see (Lustig et al, 1989).

90 THE CHOLESKY DECOMPOSITION The most computationally intensive step in each iteration of the primal-dual algorithm (indeed,T -of1any of the interior point algorithms) is the solution of the system dy := T positive diagonal matrix 0 depends on the ( A 0 j 4 ) r (where the form of r and the T and positive definite. particular algorithm). The matrix AQA is m x m, symmetric are sparse. Efficient In addition, for most realistic LP problems, both A and AQA implementation of the primal-dual algorithm depends critically on efficient solution of this system. x Linear systems of the form x = M~ b are not usually solved by forming the explicit inverse of M. This is particularly true if M is sparse, since taking the inverse does not, in general, preserve sparsity. Instead, the system Mx = b is solved directly, by decomposing M into lower- and upper-triangular factors L and £/, such that M = LU. Then the triangular systems Lx' = b and Ux = x' can be solved efficiently by forward and backward substitution, respectively. If M is symmetric and positiveTdefinite, then M can T be factored uniquely into symmetric triangular components L and L . These matrices are the Cholesky factors of M. A slight variation on this theme computes M12= LDL , where Ijj = 1 for j = 1 , . . . ,ra, D is a positive diagonal matrix and L = LD / . This version has the advantage of not requiring the computation of square roots on the diagonal of L, and it is the method that we use in our implementation. Algorithms

for Cholesky

Decomposition

The formula for computing each element of L is given by j-i

d

Uj = {mij ~ Yl kkljklik)/djj. k=i

(13)

Each diagonal element of D is given by

2 i-i dj:j = mjj - £ dkk l jk .

k=i

(14)

Note that in order to compute Uj the values of dkk-, Uk and ljk for k < j and djj must already be known. Computing djj requires ljk and dkk for k < j . As described in (George et a/., 1986) the six permutations of the indices z, j and A; in (13) and (14) naturally yield three different algorithms for performing Cholesky decomposition, and determine whether each algorithm applies to L stored in row- or column-major order. In row Cholesky (ijk and ikj), rows of L are computed successively; each element in the row is computed using the previously-computed rows and previously-computed elements in the same row. The column Cholesky algorithm (jik and jki) computes the columns in succession, using previously-computed columns. Submatrix Cholesky (kij and kji) uses each column as it is computed to update all subsequent columns. Consideration of workload and memory access patterns suggest that the jki column Cholesky algorithm and a version of the kji form known as the multifrontal algorithm (Duff and Reid, 1983) are the best candidates for implementation in parallel. A general discussion of the merits of the various forms of the algorithm is contained in (Saltzman et a/., 1989). In this paper, we concentrate on the column Cholesky method. Sparse vector operations require indirect array references. Even on vector processors with hardware for indirect references, these operations are slower than dense (direct reference) operations (Duff et al., 1986). The column and submatrix Cholesky algorithms can be modified to take advantage of the structure of L to replace some sparse SAXPYs with

91 dense SAXPYs (see Section 4). In addition, the multifrontal method replaces all sparse SAXPYs with dense operations. The algorithm performs partial factorizations of a sequence of small, dense frontal matrices. The cost to obtain this advantage is a requirement for additional memory to store intermediate, partially-factored submatrices, and the need to assemble the intermediate results to form new frontal matrices. In this paper, we will be concerned only with the column Cholesky algorithm. An important aspect of sparse matrix decomposition is the maintenance of sparsity in the resulting factors. In the case of Cholesky decomposition, the positions of nonzeros in the lower triangle of M are a subset of the positions of nonzeros in L. Fill-in (nonzeros in L corresponding to zero elements of M) is static, essentially independent of the values of the nonzeros in MT, and highly dependent on the ordering of the rows then the permutation of rows and columns of M is and columns in M. If M = AA determined by the permutation of rows of A. The Cholesky decomposition routines used up to now in implementations of interior point LP algorithms, e.g., (Adler et al., 1989a, 1989b; Marsten et ai, 1989), have used graph-based heuristics such as minimum degree or minimum local fill-in (George and Liu, 1981) to order the rows of A so as to minimize fillin. OBI implements the multiple minimum-degree ordering heuristic of Liu (1985). This heuristic is much faster than minimum degree, but gives results of comparable quality. The remaining discussion assumes that the row/column permutation of M, and hence the pattern of nonzeros in L, is fixed in advance. IMPLEMENTATION In this section, we describe the special structure of the Cholesky factor and techniques for exploiting this special structure to improve parallel and vector performance of the column Cholesky algorithm. Sparse

SAXPY

The key operation in almost all of the linear algebra in the primal-dual algorithm is SAXPY, an operation of the form y = ax + y, where x and y are vectors and a is a scalar. If x and y are dense vectors, this operation is a natural one for vectorization. If x and y are sparse, each is stored as a list of indices of nonzero entries and a list of the corresponding nonzero values. The SAXPY operation is performed by copying the entries of y into their positions in a dense work array, then multiplying the entries in x by a and summing them into the corresponding positions in the work array. Finally, the result is copied back to the sparse data structure for y. In the column and submatrix Cholesky algorithms, the nonzero elements of x are a subset of the nonzero elements of y, so the updated y contains no new nonzeros and can be returned to its original location in memory. If the work array corresponds to the computer's vector registers, the copy operation from memory (the sparse data structure) to the registers is referred to as a gather, and the copy from the registers back to memory is called a scatter. Most new vector computers implement some form of gather/scatter operations in hardware, but some early vector computers (such as the Cray-1) did not. (The IBM 3090 instruction set provides indirect vector load and store operations.) Even so, indirect operations are slower than direct ones, because an indirect reference must be performed to find a nonzero location in the work array, and because relatively few elements of the vector register may be involved in computing the new result. Vectorization of the sparse SAXPY is still a dramatic improvement over scalar processing (as we will show; see also (Lewis and Simon, 1988)), but it is still inefficient compared to the dense version of the operation.

92 / I • •

2 •

\ 3 4 •

L =

5 6 •

• •

V



7 •



o

8 o

9 /

Fig. 1. Nonzero structure of a matrix M and its Cholesky factor L.

The Elimination

Tree

T

T h e elimination tree TM associated with t h e matrix M = A®A is a rooted tree constructed from t h e following relation among t h e columns of t h e Cholesky factor L of M (Liu, 1988): Parent(j) = mm{i \ Uj ^0,i > j} Column m of L is taken as t h e root. For example, consider the m a t r i x depicted in Fig. 1. T h e subdiagonal elements of M are denoted by and t h e fill-in elements generated during t h e factorization are denoted by "o". T h e elimination tree TM of this m a t r i x is pictured in Fig. 2. Columns of L and their corresponding nodes in TM are referred t o interchangeably in t h e following discussion. T h e elimination tree represents a partial ordering of t h e columns of L, indicating t h e requirement t h a t t h e children of column j be computed before column j can be computed itself. This structure will be used to schedule columns to be computed in parallel, as described below. In addition, t h e elimination tree can be used to detect t h e presence of supernodes, as described in t h e next section. Supernodes In order to reduce t h e n u m b e r of sparse vector operations, we can exploit t h e occurrence in L of consecutive columns with t h e same p a t t e r n of nonzero entries. Such groups of columns are called supernodes (columns t h a t don't belong to such a group are simply nodes). These blocks of columns can be treated together as a dense submatrix in t h e column Cholesky algorithm. A slightly more restrictive definition of supernode is used in t h e multifrontal algorithm, and has been applied to the column Cholesky algorithm as well (Ashcraft et al, 1987; Liu, 1987). T h e more restrictive definition requires t h a t t h e columns in t h e supernode other t h a n t h e first have no direct predecessors. This is necessary for t h e multifrontal algorithm because frontal matrices corresponding to direct predecessors must be assembled into t h e frontal matrix for a column or supernode prior to t h e u p d a t e step. In t h e column Cholesky algorithm, there is no such requirement. Supernodes can b e identified in t h e elimination tree as chains of consecutively-numbered nodes, where t h e columns have identical nonzero structures. There are two points in t h e computation of a column of L where supernodes can be exploited: •

If any column in a supernode is required for the u p d a t e to t h e current column j , then all columns in t h a t supernode are needed. T h e total contribution of all columns

93

Fig. 2. Elimination tree T(M)

of t h e matrix in Fig. 1.

in t h e supernode can b e computed in dense mode and then gathered once into t h e work array. •

If t h e current column j is a member of a supernode, then updates from all columns preceding column j in t h e same supernode can be applied directly to column j . (This is an improvement t o t h e m e t h o d described in (Ashcraft et al, 1987) which treats this case t h e same as t h e previous one.)

In t h e parallel factorization, supernodes can be exploited to gain an additional level of task overlap. All columns in a supernode will require updating by the same set of lower-indexed columns not in t h e supernode (a sparse update), as well as by the previous columns within t h e supernode itself (a dense update). T h e sparse u p d a t e can be performed in parallel on all columns of t h e supernode before t h e dense update. This provides a significant improvement t o t h e performance of t h e parallel factorization algorithm. Data

Structures

T h e principal d a t a structure is a sparse, column-major representation of L. Three arrays are used: a double-precision array containing the values of all nonzeros in L, in order by column and within columns by row; an integer array containing t h e row indices of t h e nonzeros; and an m-array containing t h e index of the start of each column in t h e nonzero array. T h e diagonal m a t r i x D is stored in a separate m-array. This is a static structure t h a t can b e constructed once at t h e start of t h e primal-dual algorithm, using a symbolic T factorization procedure (Duff et al, 1986; George and Liu, 1981). T h e lower triangle of A 0 A can b e stored in t h e same d a t a structure, and t h e factorization done in place.T Since t h e contents of 0 are altered at each iteration of the primal-dual algorithm, AOA must b e re-computed (see below). An additional d a t a structure is required for column Cholesky, to link t h e columns t h a t contribute to updating each column, and to locate the nonzero entry in t h e row corresponding to t h e column to be updated. For serial (and vector) implementations, George and Liu (1981) describe a set of non-overlapping linked lists, which can be maintained

94 in two m-arrays. There is a list for each row of L. At the start of the algorithm, each column appears on the list corresponding to the row of the first subdiagonal nonzero in the column. As each column k on the list for row j is used to update column j , it is moved from the jth. list to the list corresponding to the row of the next nonzero in column k. Since each column appears on only one list at any given time, the lists themselves may be stored in a single m-array. The list headers are contained in the same array, since any column for which the list is non-empty is not yet computed, and hence is not on any list. Finally, for each column k on the list for column j , the index of lj in the array of nonzek ros is stored in a second m-array. This allows the relevant portion of column k (those entries in row j and below), to be located directly without having to search the entire column. Since all columns in a supernode are treated as a unit, only the first column of each supernode need be kept on these lists. For our experiments with the parallel column Cholesky, we use a static structure, where linked lists corresponding to each row are constructed during preprocessing. This structure requires two arrays with the same number of entries as L has nonzeros. In the future, this structure will be replaced with a dynamic structure, in which the linked lists are updated dynamically in parallel. One additional m-array is needed to identify the supernodes. For each singleton column j (not in any supernode) the corresponding entry in this array contains j . If j is the first column in a supernode, the jth entry in the array contains the index of the last column in the supernode. Each other column in the supernode whose first column is j a W contains j . Thus columns in a supernode can readily be identified and the first column in the supernode can be located directly.

T Construction

of

AQA

T

The value of an element of M = AQA

is given by n

rriij = ^ Q>i aj O . k k kk k=i Again, the ordering of the indices z, j and k determine different algorithms for constructing M. The ijk algorithm computes M one element at a time. The kij algorithm computes M as the sum of scaled outer products of columns of A. Previous implementations of interior point algorithms (notably (Adler et a/., 1989a)) use one of these techniques. Both of these methods have drawbacks, however. The ijk algorithm requires two multiplications in its innermost loop. The straightforward kij algorithm is unsuitable for sparse implementation because it is not possible to compute the location in the array of nonzeros into which to accumulate each term. One solution to both of these problems is to save a list of the elementary products (products of the pairs ai aj ) and the corresponding locations kk in the nonzero array. This allows the algorithms to run at full speed, but requires a very large amount of memory for the arrays. In fact,T in virtual memory systems the paging activity associated with the construction of AQA can slow the algorithm down. The OBI implementation avoids both elementary products and the double multiplication in the inner loop, by employing the jki Tform of the construction algorithm. This method can be interpreted as constructing AQA a column at a time. The cost of implementing this method is that A must be accessible in both row-major and column-major order. A is stored in a sparse, column-major data structure, similar to that described above for L. An additional set of three arrays links the entries in each row together on a list (this requires one m-array and an array with an entry for each nonzero) and records the column index of each nonzero entry. It is straightforward to find the entries in column k below row j . These entries can then be scaled by aj O k kk and accumulated into the appropriate positions in column j of the L-structure, using a sparse SAXPY. This method has proven to be nearly as fast as the others, and provides a significant savings in memory.

95 The jki form of the construction step is also readily implemented as a parallel algorithm, since construction of each column is independent of the others. Vectorization The sparse and dense loops that perform a column update contain an apparent dependence relation that prevents a vector compiler from automatically vectorizing the loop. In the sparse case the dependence is based on indirect indexing into the work array through the array of row indices. Since the row indices of the nonzeros in any column are distinct, the loop runs correctly when vectorized, but the compiler decision must be overridden explicitly. Similarly, the loop that packs the work array back into the sparse data structure must be explicitly vectorized. Other sparse SAXPY operations appear throughout the code: Tduring the forward and backward triangular system solution, the construction of T A alone, such as the ratio test, and the construction of A 0 A , and operations involving the right hand side of AQA x = r. The dense update of a column by other columns in the same supernode also appears to contain a dependency, since the compiler cannot guarantee that the sections of the array of nonzeros corresponding to distinct columns do not overlap. Here again, the compiler incorrectly fails to vectorize the loop and must be overridden manually. In the code tested here, all false dependencies discovered by the VS FORTRAN compiler have been explicitly overridden. There may be additional loops that are currently unanalyzable, but that could be restructured to allow vectorization, for example, by moving print statements for debugging to separate loops. Parallel

Implementation

The parallel factorization algorithm utilizes self-scheduling processes dispatched at each iteration on all available processors. Coordination of tasks is managed using three lists. Each list is updated inside a critical section, under control of a lock, to prevent simultaneous writes to the same location. •

A list of tasks ready to execute. A "task" represents either a supernode whose predecessors have been computed, or an individual column ready for its sparse update. The action taken in the former case is to replace the task on the list with the tasks corresponding to the individual columns in the supernode (splitting). To minimize contention for the lock, all other operations are integrated into the processing of these two types of tasks, as described below.



A count of the uncompleted predecessors of each column (children of the corresponding node in the elimination tree). As each column j is computed, the counter corresponding to Parent (j) is decremented. When the predecessor count goes to zero, the column is itself ready to compute. Note that, when the first column in a supernode is ready, all columns in that supernode are ready as well.



A counter of the number of columns in each supernode. As the sparse update for each column in the supernode is completed, this counter is decremented. When its value reaches zero, the supernode is ready for its dense update.

Each processor executes the following algorithm: 1. Get the next task (column j) from the task list. 2. If task j represents a supernode ready to split, schedule the sparse updates for each column and go to step 1. If j represents a single column, go to step 3. 3. Task j represents a sparse update. Perform the update and decrement the column counter for the supernode containing j . If this is the last sparse update in this supernode, then perform the dense update on the supernode and and decrement

96 Table 1. Model dimensions

Models

Rows

Columns

A Nonzeros

Density (%)

KPEAR BRANDY SHIP04L TIJST3 GROW22 SHIP08L AA2 MILT NESM SHIP12L SCFXM3 CZPROB GANGES 80BAU3B STOCFOR2 PIMS2

120 126 317 364 440 520 531 586 646 687 846 927 1137 2021 2141 2405

308 249 2118 1212 946 4283 5198 1338 2923 5427 1371 3523 1681 9799 2031 3120

631 2084 6101 7481 8252 11614 36359 6642 13256 14913 7558 10669 6740 20648 8319 19141

1.71 6.64 0.91 1.70 1.98 0.52 1.32 0.85 0.70 0.40 0.65 0.33 0.35 0.10 0.19 0.26

t h e predecessor counter for column Parent(j). If this counter goes to zero, set j := Parent(j) and go to step 2, else go to step 1. W h e n t h e task list is temporarily empty, t h e current version of the algorithm uses a spin-lock technique to wait for additional work. COMPUTATIONAL EXPERIENCE We solved several sample problems, some from t h e Netlib test set (Gay, 1985), and some from other sources, representing various applications of linear programming from stochastic modeling in forestry t o shipping, refinery operation and airline crew scheduling. T h e dimensions of t h e original problem (the A matrix) are given in Table 1. These figures are after reduction by a preprocessor t h a t fixes variables and removes redundant constraintsT before performing t h e minimum-degree ordering. Table 2 gives dimensions of t h e AA m a t r i x and tTh e number of fill-in entries in t h e X-factor. (Note t h a t the number of nonzeros given for AA is for t h e lower triangle only.) Table 3 gives the percentage of columns of L t h a t are members of a supernode. This is one indicator of t h e advantage t o be expected from exploiting t h e supernode structure to avoid sparse vector operations. Table 3 also gives t h e size of t h e largest supernode in L. T h e code used for t h e vector comparisons was compiled using the IBM F O R T R A N version 2.3 compiler under V M / X A and CMS 5.5. T h e machine was an IBM 3090-600E with t h e Vector Facility and 256 megabytes of memory. T h e parallel tests were performed using t h e I B M Parallel F O R T R A N compiler, and run after the machine was upgraded to an IBM 3090-600J. Vectorization Table 4 compares vector and scalar versions of t h e factorization. For each problem, virtual and total C P U seconds (not including input / o u t p u t times but including overhead

97

T Table 2. AA

Models

and Cholesky factor L

7

Columns

AA Nonzeros

Density (%)

Fill-ins

I Density

120 126 317 364 440 520 531 586 646 687 846 927 1137 2021 2141 2405

247 1981 3740 7395 4600 6110 27191 15308 4057 8145 8236 6616 7484 9533 12666 20201

3.43 24.96 7.44 11.10 4.76 4.52 19.32 8.93 1.95 3.46 2.30 1.54 1.16 0.47 0.55 0.70

666 2593 3977 5015 4018 6442 103898 10683 16976 471 4736 388 29284 39972 13772 43564

0.09 0.33 0.08 18.78 8.92 0.05 93.16 15.12 10.10 3.65 3.63 1.63 0.05 0.02 1.15 2.21

KPEAR BRANDY SHIP04L TEST3 GROW22 SHIP08L AA2 MILT NESM SHIP12L SCFXM3 CZPROB GANGES 80BAU3B ST0CF0R2 PIMS2

Table 3. Supernodes

Models TEST3 GROW22 AA2 MILT NESM SHIP12L SCFXM3 CZPROB STOCFOR2 PIMS2

% of columns in supernode

Max size of a supernode

67.31 6.36 95.48 88.06 50.16 31.88 62.77 4.21 35.22 59.46

51 28 143 53 29 15 17 3 27 120

98 Table 4. Time for each method in serial and vector mode

Models

Class

TEST3

V T V T V T V T V T V T V T V T V T V T

GROW22 AA2 MILT NESM SHIP12L SCFXM3 CZPROB ST0CF0R2 PIMS2

Serial test KCHC KCHC1 23.74 23.88 16.01 16.15 321.51 323.52 60.94 61.31 87.37 87.84 16.39 16.47 23.67 23.80 15.70 15.81 43.50 43.72 203.35 204.55

21.72 21.84 16.00 16.09 271.88 273.41 61.41 61.75 80.48 80.93 16.20 16.29 22.73 23.04 15.85 15.97 41.62 41.84 173.17 174.14

Vector test KCHC KCHC1 8.93 9.01 8.08 8.12 60.07 60.55 19.85 20.01 34.42 34.70 12.11 12.18 13.15 13.26 12.42 12.53 25.96 26.09 62.41 63.24

8.13 8.21 8.12 8.19 50.32 50.73 18.69 18.81 31.66 31.87 12.04 12.12 12.67 12.76 12.53 12.64 25.29 25.48 53.24 53.47

Iter. 39 47 25 50/53 81 38 55 46 41 38

for setting up the L data structure) are given for each of four runs: a standard column Cholesky algorithm (KCHC) and a column Cholesky with supernodes (KCHCS), with the code compiled in serial mode (compiled with the NOVECTOR option) and in vector mode (with the VECTOR option and all compiler directives activated). Finally, the number of T algorithm, including a "pre-factorization" to detect iterations required by the primal-dual any linearly dependent rows of AA , is given. Where two iteration counts are given, the first refers to KCHC and the second to KCHCS. In these cases, the difference in the order of computation causes the termination criterion to be satisfied after different numbers of iterations in each case. The benefits of careful vectorization are immediately apparent. Savings range from 20% for CZPROB to 80% for AA2. These results indicate that care is required when porting serial code to vector machines; automatic vectorizing compilers are not a complete solution to the "dusty deck" problem. Vectorization is less advantageous when L is relatively sparse, and more advantageous when L contains columns with many nonzeros. A large fraction of columns in supernodes appears also to increase the benefits of vectorization. The advantages of supernodes are, of course, most pronounced when there are many columns in supernodes and when the supernodes are large. In CZPROB, for example, the supernode structure is disadvantageous, but in AA2, PIMS2 and NESM the advantage from exploiting supernodes is pronounced. This suggests that a more careful examination of the supernode structure of a problem would be beneficial. One possible step would be to merge columns adjacent to supernodes into the supernodes to increase their size. This would entail including some explicit zero entries in L, but the tradeoff for increasing the portion of dense updates should be worthwhile, up to a point.

99 Table 5. Wall Clock Times (seconds)

Problem KPEAR BRANDY SHIP04L TEST3 GROW22 SHIP08L MILT NESM SHIP12L SCFXM3 CZPROB GANGES 80BAU3B ST0CF0R2 PIMS2 Parallel

iter 19 27 22 39 30 24 49 66 27 39 57 34 77 60 32

P := 1 C 0.42 1.10 2.68 7.41 4.46 4.92 15.37 22.82 7.35 7.83 10.44 12.25 64.48 21.89 36.22

S

0.45 1.12 2.71 7.00 4.61 4.97 14.04 21.41 7.41 7.90 10.58 11.01 61.07 22.23 32.49

P = 3 C 0.34 0.85 1.84 5.82 2.61 3.39 10.64 16.78 4.64 5.10 8.31 8.96 51.30 12.49 30.02

S

0.40 0.90 1.86 4.49 2.83 3.35 7.23 12.81 4.80 4.65 7.71 6.75 40.11 12.66 19.06

P = 6 C 0.36 0.82 2.13 5.51 2.36 3.03 9.94 15.79 4.09 4.63 7.67 8.38 47.87 10.19 28.23

S

0.48 0.83 2.08 4.04 2.57 3.04 5.90 10.95 4.33 4.19 7.07 5.51 35.16 10.77 16.24

Implementation

Table 5 describes the results of a standalone test comparing wall-clock times on one, three and six processors. Each test compares a straight parallel column Cholesky (essentially the above algorithm with every column treated as an independent supernode) with the supernode algorithm described above. The scheduling overhead for these algorithms on a single processor, as compared to a uniprocessor implementation, is negligible. (Differences in iteration counts between Tables 4 and 5 are due to adjustments in parameters of the algorithm between the tests.)

T algorithm, the following steps were implemented in Along with the parallel Cholesky parallel: construction of A 0 A , and computation of Ax and yA. The performance of the supernode algorithm improves on the standard algorithm by an average of 5% on a single processor, but increases to a 25% average improvement on three processors and a 35% average improvement on six processors. This points to the effectiveness of the parallel sparse update of columns in supernodes. The average speedup of the supernode algorithm when multiple processors are used is 1.60 for three processors and 1.85 for six processors. The maximum speedup achieved was nearly 2.0 on three processors, and nearly 2.4 on six processors (both for the problem MILT). Due to the hierarchical nature of the tasks and the fact that parallelism is exploited only in the factorization step, it is probably not reasonable to expect linear speedup. For larger problems, it is likely that the speedup on six processors would also improve, as the ratio of work in the factorization step would be maintained at a higher level. CONCLUSIONS Our investigations have shown that dramatic improvements in performance of interior point algorithms are possible if they are implemented correctly on parallel and vector computers. We have achieved substantial performance improvements on the IBM 3090 by

100 exploiting its parallel and vector processing capabilities, and t h e special structure inherent in t h e problem t o b e solved. There are several areas for future research. Better measurement and larger test problems would give b e t t e r insight into how effectively parallel processors could be used. Additional opportunities for parallelism exist, particularly in t h e solution of systems involving t h e factored m a t r i x , as well as other computations. Different methods for factorization, such as t h e multifrontal algorithm or iterative methods may yet prove more suitable. In addition, problems with special structure m a y prove particularly well-suited for parallel solution. T h e study of interior point methods is still in its infancy. Many issues remain t o be resolved before we can say t h a t we have fully exploited t h e capabilities of advanced computer architectures for this problem. In addition, developments in interior point methods have spurred new research into implementation of t h e supposedly m a t u r e simplex method. Research into these m e t h o d s and their application in other optimization problems will certainly continue t o advance t h e state of t h e art in both supercomputing and optimization well into t h e future. ACKNOWLEDGEMENTS This work was supported in part by t h e Center for Research on Parallel Commutation through N S F Cooperative Agreement No. CCR-8809615. Computational testing was conducted using t h e Cornell National Supercomputer Facility, a resource of t h e Center for Theory and Simulation in Science and Engineering (Cornell Theory Center), which receives major funding from t h e National Science Foundation and I B M Corporation, with additional support from New York State a n d members of t h e Corporate Research Institute. T h e author would like t o thank t h e Theory Center consulting staff for their invaluable assistance. T h a n k s are also due t o Ho-Won J u n g for assistance with early coding and testing. REFERENCES Adler, I., N. Karmarkar, M. G. C. Resende and G. Veiga (1989a). D a t a structures a n d programming techniques for t h e implementation of Karmarkar's algorithm. ORSA J. Comput., 1(2), 84-106. Adler, I., M. G. C. Resende, G. Veiga and N. Karmarkar (1989b). An implementation of K a r m a r k a r ' s algorithm for linear programming. Math. Progr., ^ ( 3 ) , 297-336. Ashcraft, C. C , R. G. Grimes, J . G. Lewis, B . W . Peyton and H. D. Simon (1987). Progress in sparse m a t r i x methods for large linear systems on vector supercomputers. Int. J. Super., 1(A), 10-30. Duff, I. S. a n d J. K. Reid (1983). T h e multifrontal solution of indefinite sparse symmetric sets of linear equations. ACM T. Math., 9, 302-325. Duff, I. S., A. M. Erisman and J. K. Reid (1986). Direct Methods for Sparse Matrices. Oxford University Press, New York NY. Gay, D. M. (1985). Electronic mail distribution of linear programming test problems. COAL Newsletter, 13, 10-13. George, A. and J. W.-H. Liu (1981). Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall, Inc., Englewood Cliffs N J . George, A., M. T . Heath and J. W.-H. Liu (1986). Parallel Cholesky factorization on a shared-memory multiprocessor. Lin. Alg. App., 7, 165-187. Lewis, J. G. and H. D. Simon (1988). T h e impact of hardware gather/scatter on sparse Gaussian elimination. SI AM J. Sci., 9(2), 304-311. Liu, J. W . - H . (1985). Modification of t h e minimum-degree algorithm by multiple elimination. ACM T. Math., 11(2), 141-153.

101 Liu, J. W.-H. (1987). On t h e storage requirement in t h e out-of-core multifrontal m e t h o d for sparse factorization. ACM T. Math., 12(Z), 249-264. Liu, J. W.-H. (1988). T h e role of elimination trees in sparse factorization. Tech. rep., Dept. of C o m p u t e r Science, York University, North York, Ontario, Canada. Lustig, I. J., R. E. Marsten and D. F . Shanno (1989). Computational experience with a primal-dual interior point m e t h o d for linear programming. Tech. rep. J-89-11, School of Industrial and Systems Engineering, Georgia Institute of Technology, A t l a n t a GA. Marsten, R. E., M. J. Saltzman, D. F . Shanno, G. S. Pierce and J. F . Ballintijn (1989). Implementation of a dual affine interior point algorithm for linear programming. ORSA J. Comput., 1(A), 287-297. McShane, K. A., C. L. M o n m a and D. F . Shanno (1989). An implementation of a primaldual interior point m e t h o d for linear programming. ORSA J. Comput., 1(3), 70-83. Saltzman, M. J., R. Subramanian and R. E. Marsten (1989). Implementing an interior point L P algorithm on a supercomputer. In: Impacts of Recent Computer Advances on Operations Research (R. Sharda, B. L. Golden, E. Wasil, 0 . Balci and W . Stewart, eds.), p p . 158-168. North-Holland, New York NY.

This page intentionally left blank

comp^

5

ni

N

s

^

C

H

.

This page intentionally left blank

ALTERNATE SERVER DISCIPLINES F O R MOBILE-SERVERS ON A C O N G E S T E D N E T W O R K Stephen K. P a r k * , Stephen Harvey**, Rex K. Kincaid*** a n d K e i t h Miller* * D e p a r t m e n t of C o m p u t e r Science, College of William & M a r y *** D e p a r t m e n t of M a t h e m a t i c s , College of William & M a r y Williamsburg, Va 23185 ** Price Waterhouse, 10420 Little P a t u x e n t Parkway, Suite 300 Columbia, M d 21044 ABSTRACT T h i s p a p e r examines, by discrete-event simulation, t h e following question relative t o t h e mobile-server problem: given k servers with fixed h o m e network locations, t h e shortest travel-time topology of t h e network, a stochastic model of t h e request-for-service distrib u t i o n across t h e network a n d a stochastic model of t h e service t i m e distribution, w h a t is t h e best server discipline? T h a t is, how should servers b e optimally a n d dynamically assigned t o service requests? Four server disciplines are studied: first available, districting, look-ahead a n d shortest j o b completion. Of these, t h e shortest j o b completion discipline is d e m o n s t r a t e d t o provide m i n i m u m average response, for all levels of congestion, where for each request t h e response is t h e elapsed t i m e between t h e creation of t h e request for service a n d t h e initiation of on-site service. KEYWORDS Mobile servers; networks; queueing theory; j o b scheduling. 1. I N T R O D U C T I O N T h e s t a n d a r d multi-server model in traditional queueing theory is a single queue w i t h two or more identical servers which o p e r a t e in parallel. Mobile customers (jobs) travel t o t h e s t a t i o n a r y servers seeking service. T h e customer arrival times are stochastic a n d when service is provided, t h e t i m e required t o do so is also stochastic. T h i s model has been studied extensively and, subject t o familiar Markov-like simplifying stochastic assumptions, t h e s t e a d y - s t a t e (infinite-horizon) expected-value performance h a s b e e n well analyzed (see, for example, Kleinrock, 1975 or Gross a n d Harris, 1985). T h e r e are, however, essentially n o simple analytic results relative t o transient performance analysis. An interesting variation of t h e traditional multi-server queueing model is t h e mobileserver model. Instead of mobile customers a n d stationary servers, in a mobile-server model t h e servers are mobile a n d travel on a network t o service s t a t i o n a r y customers at t h e network nodes. Each server has a fixed network node h o m e location, a n d after providing service, a server r e t u r n s t o its h o m e location t o await a n o t h e r service request. This home-location constraint, with t h e associated requirement t o always r e t u r n t o t h e home location—even if additional service requests are pending—is a n i m p o r t a n t p a r t of t h e mobile-server model studied in this paper.

105

106 Unless all servers have t h e same home location, t h e home-location-dependent travel t i m e requirement m e a n s t h a t t h e servers in a mobile-server m o d e l are not stochastically identical. Moreover, t h e service t i m e distribution is customer-server d e p e n d e n t . For these two reasons, t h e r e are essentially n o known simple results characterizing either t h e steady-state or transient performance of a mobile-server model. T h e mobile-server model is, however, readily amenable t o a discrete-event-based simulation analysis (see, for example, Kincaid et al, 1989, 1991). T h e mobile-server model h a s a variety of applications, m a n y of which deal w i t h public service systems such as an emergency medical service. For example, in a mobile-server model of an emergency medical system t h e servers are ambulances a n d t h e s t a t i o n a r y customers (jobs) are at t h e site of t h e emergency call. T h e h o m e location of each server would b e t h e base t o which t h e ambulance is assigned. T h e a s s u m p t i o n here is t h a t an ambulance would travel t o t h e customer, administer some on-site service (pick-up, emergency care, etc), t h e n r e t u r n with t h e customer t o t h e ambulance's h o m e location where t h e customer would receive additional service (unload, a d m i t t a n c e , e t c ) . At t h e completion of this additional service, t h e ambulance can t h e n b e assigned a n e w service request. Despite an absence of analytic results, several simulation-based variants of t h e mobileserver model have been used t o effectively analyze public service systems. For example, Larson's h y p e r c u b e model (1987) has been used for t h e travel-time analysis jof police vehicles. Several studies have focused on ambulance location policies. Fujiwara et al, (1987) examined t h e effects of a d e m a n d increase on current a m b u l a n c e deployment in Bangkok, T h a i l a n d . Uyeno a n d Seeberg (1984) developed a detailed simulation model t o determine ambulance home locations in British Columbia. Lubicz a n d Mielczarek (1987) determined t h e n u m b e r of ambulances needed t o service a r u r a l emergency medical system in Poland. Mobile-server models have also been used for other problems. For example, Kincaid a n d Maimon (1989) have modeled a u t o m a t e d guided vehicles in flexible m a n u f a c t u r i n g syst e m s using t h e mobile-server h o m e location model. Fontenot (1989) examined software congestion using a form of t h e mobile-server model. Some metric must b e used t o quantify t h e level of service provided by t h e mobile servers. Although effectiveness and equity can b e i m p o r t a n t measures of service, in this p a p e r we focus on efficiency, as measured by t h e m e a n response time. T h e results in this p a p e r represent an extension of Kincaid et al, (1991), which explored, via simulation, t h e effects of network congestion on t h e optimal home locations for mobile servers. This p a p e r examines a question posed in t h e earlier paper: given t h e server home locations, t h e travel-time topology of t h e network, a stochastic model of t h e requestfor-service distribution across t h e network a n d a stochastic model of t h e service time distribution, w h a t is t h e best server discipline? T h a t is, how should servers b e optimally and dynamically assigned t o service requests? In addition t o Section 1 (introduction) a n d Section 5 (conclusions), t h e b o d y of t h e p a p e r is organized into three sections. T h e mobile server p r o b l e m is formulated in Section 2. This formulation is purposefully general, avoiding t h e details of, for example, an emergency medical system. Section 3 presents a discussion of four server disciplines a n d t h e results of a simulation-based s t u d y of t h e performance of these four disciplines for a specific 10-node, 3-server network. In Section 4 we briefly review t h e extent t o which traditional queueing theory, job scheduling theory a n d a variant of t h e mobileserver problem are applicable t o a n analysis of t h e mobile-server model. Because t h e material in Sections 2 a n d 3 is not dependent on Section 4, we have chosen t o place t h e material in Section 4 at t h e end, r a t h e r t h a n t h e beginning, t o avoid obscuring t h e p r i m a r y purpose of t h e p a p e r .

107 2. F O R M U L A T I O N Figure 2.1 illustrates a 10-node network serviced by k = 3 servers. T h i s network was first studied in B e r m a n et al, (1987) a n d later in Kincaid et al, (1989, 1991). E a c h server resides at one of t h e nodes as its home location—the node where t h e server resides when it is idle. T h e edges connecting t h e network nodes are labeled with a n u m b e r which denotes t h e travel time between t h e nodes. Beside each n o d e is a fraction which represents t h e p r o p o r t i o n of service calls which originate, at r a n d o m , in t h a t n o d e . T h e net (Poisson) r a t e at which service calls occur is X(t). As t h e n o t a t i o n suggests, this r a t e can vary with time; however, for simplicity, a non-stationary r a t e is not considered in this paper.

2

Figure 2.1.

li^J30

A 10-node, 3-server network.

Travel t h r o u g h o u t t h e network is assumed to occur along shortest p a t h s . For example, t h e travel t i m e from n o d e 4 t o node 8 is 3 time units a n d t h e travel t i m e from n o d e 2 t o n o d e 8 is 8 t i m e u n i t s . W h e n a request for service occurs, a mobile server is designated t o service t h e request according t o t h e server discipline—the algorithm used t o assign servers t o service requests. A server is only available t o begin a service assignment if it is idle at its h o m e location. If no servers are available at t h e t i m e of a service request t h e n a single request-for-service queue forms. W h e n a queue forms, t h e network is said to b e congested. Service time is t h e s u m of two components. T h e travel t i m e component represents t h e round-trip travel t i m e from t h e server's home location n o d e t o t h e request n o d e a n d back again t o t h e home location node. T h e travel time component is not stochastic (although, in a simulation environment it is an easy a n d meaningful model extension to allow for stochastic travel times). T h e non-travel t i m e component represents t h e on-scene service t i m e at t h e request node a n d any off-scene service time which is needed at t h e server's home location node. T h e non-travel time component is stochastic. For example, t h e service t i m e for a server with a n o d e 2 h o m e location responding t o a service request at node 6 a n d with a r a n d o m non-travel t i m e component of 2.5 time units would b e 10.5 t i m e units (8 units for t h e round t r i p travel t i m e component a n d

108 2.5 units of non-travel service time). For all t h e results r e p o r t e d in this p a p e r , t h e nontravel t i m e component of t h e service time is assumed t o b e a n exponentially d i s t r i b u t e d r a n d o m variable w i t h m e a n 1. T h e response time associated w i t h each request for service is also t h e s u m of two comp o n e n t s . T h e delay component represents t h e t i m e (if a n y ) t h e request for service must spend in t h e queue. T h e travel t i m e component is t n e t i m e t h e server s p e n d s traveling from its h o m e location n o d e t o t h e request node. Relative t o t h e previous example, if t h e request for service at n o d e 6 h a d t o s p e n d 7 t i m e u n i t s in t h e queue, t h e n t h e response t i m e would b e 7 + 4 = 11 t i m e units. Note t h a t t h e t r i p back t o t h e server's h o m e location a n d t h e non-travel component of t h e service t i m e is n o t included in t h e response t i m e calculation. k-Median

Home

Locations

T h e fc-median p r o b l e m is a classic deterministic network location p r o b l e m (see Handler a n d Mirchandani, 1979). T h e objective is t o determine, from a m o n g all possible node subsets of size fc, server h o m e locations t h a t minimize t h e s u m of t h e travel times from each network n o d e t o its closest server h o m e location. If t h e network congestion is low t h e n it is intuitive t h a t t h e k server home locations should b e t h e network's &-median nodes. T h e s e h o m e locations minimize t h e (zero delay) expected response t i m e . For Figure 2 . 1 , t h e 3-median nodes a r e ( 2 , 4 , 9 ) a n d t h e corresponding average response time ( 5 / 3 0 ) ( 2 ) + (5/30)(0) + (2/30)(2) + (5/30)(0) + ( 3 / 3 0 ) ( 2 ) + ( 2 / 3 0 ) ( 4 ) + ( 2 / 3 0 ) ( 2 ) + ( 2 / 3 0 ) ( 3 ) + ( 2 / 3 0 ) ( 0 ) + ( 2 / 3 0 ) ( 2 ) = 1.4. Although t h e fc-median home locations are optimal when t h e r e is n o congestion, as d e m o n s t r a t e d in Kincaid et al, (1991), these home locations can b e substantially subo p t i m a l w h e n t h e r e is congestion. T h e reason for t h i s is t h a t t h e fc-median solution assumes t h a t t h e closest server is always available t o respond t o a service request at t h e m o m e n t t h e request for service occurs. W h e n t h e network is congested, t h e closest server will not always b e immediately available t o service requests. Moreover, there is n o g u a r a n t e e t h a t t h e closest server will b e assigned t o this server request. T h e d y n a m i c assignment of servers t o service requests is determined by t h e server discipline; it h a s been our experience t h a t t h e most effective way t o s t u d y different mobile-server disciplines for a congested network is by discrete-event simulation. 3. S E R V E R D I S C I P L I N E S T h e server discipline is a n i m p o r t a n t factor in determining t h e efficiency of a congested mobile-server network. T h e r e are four server disciplines which will b e studied in this paper: (1) t h e first available server discipline, (2) t h e districting discipline, (3) t h e look-ahead discipline, a n d (4) t h e shortest job completion ( S J C ) discipline. For fixed server h o m e locations, server disciplines (1), (3) a n d (4) are equivalent when t h e r e is n o congestion. Moreover, in this case server discipline (2) is also equivalent t o t h e other three provided t h e districting is based u p o n a closest-server partitioning of network nodes into districts. However, as illustrated in Figure 3.1 when t h e r e is congestion each server discipline produces a different (steady-state) average response t i m e . A discussion is included in Kincaid et a/., (1991) of three system performance criteria for t h e mobile-server problem: efficiency, effectiveness, a n d equity. Because o u r results are simulation based, a variety of performance criteria could b e studied. For simplicity, however, we focus exclusively on efficiency in this paper. We define efficiency as average response time. In, for example, a flexible manufacturing system w i t h A G V ' s as servers, focusing exclusively on efficiency is quite a p p r o p r i a t e . However, we n o t e t h a t such exclusivity is not desirable in an E M S environment (or, for t h a t m a t t e r , in any environment in which t h e worst case outcome of a service event is c a t a s t r o p h i c ) .

109 First Available

Discipline

In traditional queueing theory t h e (stationary) servers are usually assumed t o b e statistically identical. Therefore t h e issue of a server discipline never arises—when congestion develops t h e r e is n o reason t o u s e a n y t h i n g other t h a n t h e first available server. However, with mobile servers t h e server discipline can have a substantial effect o n system performance. This occurs because servers with different home locations a r e (because of their different travel times) not statistically identical. We can extend t h e traditional first-available rule t o mobile servers. Specifically, when a request for service occurs t h e closest available (idle) server is assigned t o it. If t h e r e are no idle servers t h e n t h e first server t o become idle will b e assigned t o this request. T h i s server discipline h a s a clear intuitive appeal a n d a n i m p o r t a n t sense of fairness, particularly in a n emergency medical system application. However, with t h e first-available server discipline, whenever t h e level of congestion is high servers m a y have t o travel far from their home locations resulting in frequent large response times. T h i s is illustrated in Figure 3.1 which presents steady-state average response times for all four of t h e server disciplines a t various levels of congestion.

7 6 r s P o

first-available / 5

S 4h

z

e

districting

^^

look-ahead SJC

~~~~

3h

•= 2

L

1

_ - *= ~

i — i — i — i — i — i — i — i — i — i — i — i — i

0.10

0.15

0.20

0.25

0.30

0.35

Figure 3.1 - Steady-state average response.

T h e results in Figure 3.1 correspond t o stationary request-for-service rates A a n d were produced, for each value of A a n d server discipline, by simulating t h e processing of several h u n d r e d t h o u s a n d service requests. T h e m e t h o d of b a t c h m e a n s was used t o construct 9 5 % confidence interval estimates for t h e • average response points indicated. T h e interval estimates were sufficiently tight t o ignore, for graphical purposes, t h e inherent uncertainty in t h e estimates. (For example, t h e most u n c e r t a i n average response estimate is t h e first available service discipline value which is shown as 7.7 a t A = 0.25. Based on 500,000 service requests with a b a t c h size of 1000, t h e 9 5 % confidence b a t c h m e a n interval estimate for this average response was 7.69 ± 0.15.) Districting

Discipline

As a n a l t e r n a t e t o t h e first-available server discipline, a second server discipline is based u p o n partitioning t h e network into districts (sub-networks) w i t h one server p e r district.

110 One logical way to do this is t o define district 1 to be the nodes which have server 1 closest to them. These nodes will be serviced by server 1 exclusively. Districts 2 , 3 , . . . , k are defined analogously. In this way k disjoint districts are formed which operate independently from one another. As discussed in the next paragraph, it may be desirable to make adjustments to the districts to balance server utilization. Within each district, the server discipline is first available. Given the ( 2 , 4 , 9 ) 3-median home locations for the network in Figure 2.1, we see that the server home-located at node 2 is closest to nodes 2, 3, 5 and 6; the server homelocated at node 4 is closest to nodes 1, 4 and 8; the server home-located at node 9 is closest to nodes 9 and 10. The two servers home located at nodes 4 and 9 are equally close to node 7. In order to circumvent a utilization imbalance, the district definitions were taken to be district 1 = { 2 , 3 , 5 , 6 }

district 2 = { 1 , 4 }

district 3 = { 7 , 8 , 9 , 1 0 }

In this way, server 1 (at node 2) will service 6/15 of the requests, server 2 (at node 4) will service 5/15 of the requests, and server 3 (at node 9) will service 4 / 1 5 of the requests. (However, because of the districting assignment of node 8 to the server at node 9, the zerordelay expected response for the districting discipline is 1.467 rather than 1.4.) Carter, Chaiken and Ignall (1972) used the districting approach to resolve the issue of cooperation between emergency service units. They determined district boundaries which would minimize average response time and workload imbalance. Similarly, Klementiev (1983) showed that assignments of ambulances to certain districts was more efficient for high congestion rates than strategies not assuming this type of an assignment. As illustrated in Figure 3.1, when compared to the first available server discipline, the districting server discipline can dramatically reduce steady-state average response times for high congestion rates. For example, at A = 0.25 we find a dramatic decrease in average response time from 7.7 to 3.0. This decrease is because servers never travel far from their home location. Indeed (except for requests from node 8), servers only respond to service requests from those nodes to which they are closest. Note that by partitioning the network into districts { 2 , 3 , 5 , 6 } , { 1 , 4 } , and { 7 , 8 , 9 , 1 0 } , there are actually eight different sets of home locations which minimize average response time. For example, the server originally home-located at node 4 could also be located at node 1 with no change in average system response time. This is possible since both node 1 and node 4 have the same request for service rate. Similarly, the server originally placed at node 9 could also be place at node 7, 8, or 10 since the travel distances between these nodes and the request for service rates of each node are all identical. We have confirmed by simulation that by partitioning into districts and placing the servers at home locations ( 2 , 4 , 8 ) , for all levels of congestion the same steady-state response time is produced as with trie original home locations ( 2 , 4 , 9 ) . By using locations ( 2 , 4 , 8 ) , however, the server closest to each node will always respond to service requests. Look-ahead

Discipline

As discussed previously, in a mobile-server model a server must first travel to the origination of a request before service can begin. During this travel time a server home-located closer to the request node may become available and, if dispatched, be able to reach the request node faster than the first available server. (Furthermore, the second server's return trip time to it's home location would also be smaller than the first server's.) In a simulation environment we know exactly when each busy server will next become idle. Using this information, we are able to determine each server's response time to a call by summing the delay the call would spend in the queue (waiting for the server to become available) with the server's travel time to the location of the call. Bv picking

111 t h e m i n i m u m response time we are always able t o assign t h e server who can complete all t h e jobs previously assigned to h i m a n d t h e n respond t o this new request first. T h i s is t h e look-ahead server discipline. ( W i t h this server discipline, as well as t h e S J C server discipline t o follow, t h e single request-for-service queue is not F I F O . However, this single queue n a t u r a l l y partitions into three F I F O queues, one p e r server.) Figure 3.1 illustrates t h a t as congestion increases t h e look-ahead server discipline becomes increasingly more efficient t h a n t h e first-available discipline. For example, at A = 0.25 average response time is reduced from 7.7 to 2.7. As congestion levels rise, however, t h e look-ahead discipline may still dispatch servers far from their h o m e location, a n d t h u s produce long r e t u r n trips. In fact, although t h e look-ahead discipline produces lower average response times t h a n t h e districting discipline for lower levels of congestion, at values of A > 0.35 t h e districting discipline produces b e t t e r average response times. Shortest

Job Completion

Discipline

At high congestion levels t h e look-ahead discipline is not optimal. T h e reason for this is t h a t t h e discipline ignores t h e server's r e t u r n time (to its home location). Consider, for example, t h e case in which a server can't reach t h e request n o d e first b u t can travel t o t h e request node, provide service and then r e t u r n to it's home location first. This occurs when a server who could not reach t h e request node first could arrive shortly thereafter and would have a shorter trip back to it's home location after completing service. T h i s observation leads to a fourth server discipline—the shortest j o b completion ( S J C ) server discipline which always assigns t h e server who, after completing all t h e j o b ' s assigned to him included t h e new job, can become idle at his h o m e location first. Figure 3.1 illustrates t h a t for all levels of congestion, t h e S J C server discipline produces t h e smallest average response (when t h e differences are statistically significant). Moreover, as t h e congestion level increases, t h e S J C server discipline becomes substantially more efficient t h a n t h e other three disciplines. For example, at A = 0.25 t h e S J C server discipline gives an average response time of approximately 2.5. An a t t r a c t i v e feature of t h e S J C server discipline is t h a t servers are unlikely t o b e sent t o distant nodes a n d in t h a t sense S J C is much like t h e districting discipline. In b o t h t h e first-available a n d (to a lesser extent) look-ahead server disciplines, servers are likely t o b e sent far from their home location when high-congestion occurs. Since t h e S J C server discipline considers t h e r e t u r n t r i p back to a server's home location, this server discipline remains efficient even when t h e congestion is high. It should be noted t h a t t h e first-available and districting disciplines are conservative—no server will ever remain idle if there is work t o b e done (i.e. there is a request-for-service queue). In contrast, t h e look-ahead and S J C disciplines are non-conservative. Indeed, it is t h e non-conservative n a t u r e of these two disciplines t h a t allows t h e m to exhibit superior performance at high levels of congestion by largely ignoring distant service requests, thereby avoiding time-consuming trips far from their home-locations. T h i s is an import a n t point with potentially significant political impact in some practical applications. F u r t h e r m o r e , we repeat for emphasis t h a t t h e look-ahead a n d shortest j o b completion server disciplines m a y not be practical in some applications because they require future knowledge. However, even in those applications, t h e shortest j o b completion discipline is meaningful because it a p p e a r s to provide an i m p o r t a n t theoretical b o u n d on best possible average response performance. 4. R E L A T E D T H E O R Y In this section we briefly review t h e extent to which traditional queueing theory, j o b scheduling theory a n d a variant of t h e mobile-server problem are applicable t o t h e homelocation-dependent mobile-server model formulated in Section 2. We can cite n o results

112 which are directly applicable. However, t h e b r e a d t h of related work is impressive a n d provides inspiration for future research. Queueing

Theory

T h e mobile-server model is a variant of t h e traditional multi-server queueing model. In t h e traditional queueing model t h e customers are mobile a n d t h e servers a r e stationary; in t h e mobile-server model t h e servers travel on a network t o service s t a t i o n a r y customers at t h e network nodes. In t h e terminology of queueing theory, t h e mobile-server model is a n M/G/k queueing model with distinguishable servers, state-dependent service times a n d a Poisson request-for-service process which m a y b e non-homogeneous. W i t h this generality, there is no known closed-form expression for either t h e transient or s t e a d y - s t a t e average response—no m a t t e r w h a t t h e server discipline. As a n approximation, a great deal is known a b o u t t h e s t e a d y - s t a t e (expected value) performance of an M/M/k model with indistinguishable servers, s t a t e independent (exponential) service times a n d a homogeneous request-for-service r a t e . However, this is a poor approximation, particularly if t h e travel times are a significant p a r t of t h e service time. Moreover, for this model, t h e server discipline is not an issue because t h e servers are indistinguishable. If there is just one server (k = 1) and if t h e objective is t o determine a h o m e location for this server t h a t minimizes t h e expected response to a r a n d o m call for service it is possible to use steady-state M/G/l queueing theory results a n d optimally locate t h e server. B e r m a n et al, (1985) a n d Chiu et al, (1985) provide polynomial t i m e algorithms to solve t h e single-server home location problem, assuming steady-state results are applicable and t h a t a F I F O queue discipline is employed. We are not aware of any theoretical results relative to t h e multiple mobile-server home location problem. Heuristic solutions have followed two p a t h s . First if we assume t h a t t h e k server home locations are identical t h e n t h e servers m a y b e assumed t o be indistinguishable a n d several approximations t o t h e M/G/k system are available; see Larson a n d Odoni (1981). Second, when t h e A; servers are distinguishable B e r m a n et al, (1987) provide a heuristic solution m e t h o d which relies heavily u p o n t h e h y p e r c u b e model of Larson (1974) a n d steady-state results for an M/M/k queueing system with distinguishable servers. To t h e best of our knowledge there are no analytical results for t h e o p t i m a l server discipline for an M/G/k queueing model with distinguishable servers. Job

Scheduling

T h e k mobile-server problem is equivalent to a stochastic version of a j o b scheduling problem. T h a t is, consider a simulation of a multi-server queue. A simulation experiment provides us, a priori, with all arrival times a n d service times for each j o b . Consequently, determining t h e server discipline t h a t minimizes t h e average t i m e a customer spends in t h e system is equivalent to t h e j o b scheduling problem denoted l | r j | 5~)Cj. (This is t h e j o b scheduling n o t a t i o n a d o p t e d by Lawler et al„ 1991). T h e 1 |rj | 53 C j j o b scheduling problem is defined in t e r m s of n jobs indexed j = 1 , . . . , n . Each j o b consists of a single operation which must be scheduled for service on one of k machines. Each machine (server) is considered to b e non-conservative. T h a t is, a machine is allowed to b e idle even if a job is available to be processed. Each j o b j arrives at (release) time rj a n d n o j o b is allowed t o preempt a n o t h e r j o b . E a c h machine can process at most one j o b at a time. T h e objective t o b e minimized is t h e s u m of t h e j o b completion times Cj (job service t i m e + delay time). Unfortunately, even this simple problem h a s been shown t o b e strongly NP-haxd; see L e n s t r a et al, (1977). However if all n = 0 (all n jobs arrive at once) t h e n shortest iob first is o p t i m a l i.e. we have

113

Theorem 4.1—The optimal schedule for a 1| | ^ Cj problem jobs in the order of nondecreasing processing time.

is given by sequencing

the

Conway et al, (1967) point o u t t h a t t h e origin of this result is difficult t o d e t e r m i n e b u t is generally a t t r i b u t e d t o S m i t h (1956). Smith also solves t h e 1| I^WJCJ p r o b l e m by proving t h a t any sequence is o p t i m a l t h a t p u t s t h e jobs in order of nonaecreasing ratios Pj/wj where pj is t h e processing (service) t i m e of j o b j a n d t h e Wj's are (non-negative) weights. Several generalizations of this t h e o r e m exist for special classes of precedence constraints; see Lawler et al, (1991). Moreover, if p r e e m p t i o n (pmtn) is allowed t h e n t h e o p t i m a l schedule for t h e l\pmtn, rj\ ^ Cj problem is given by t h e shortest remaining processing time rule; see Baker (1974). However, \\pmtn, rj\ ^2 WjCj is strongly N P - h a r d ; see Labetoulle et al, (1984). Finally we n o t e t h a t Gazmuri (1985) h a s developed a n asymptotically o p t i m a l algorithm for t h e l | r j | ^ Cj problem u n d e r t h e a s s u m p t i o n t h a t processing times a n d release times are independent a n d identically d i s t r i b u t e d . O t h e r stochastic generalizations of T h e o r e m 4.1 also exist; see, for example, P i n e d o (1983) a n d B r u n o et al, (1981). T h e mobile server problem can be characterized as k parallel machines processing n independent jobs such t h a t t h e processing t i m e by machine i for j o b j is Pj/sij where Sij is a given j o b - d e p e n d e n t speed for machine i. In addition t h e release t i m e for j o b j is a r a n d o m variable r , , exponentially distributed with r a t e Xj. T h e processing t i m e pj is also a r a n d o m variable with a general distribution. As n o t e d before t h e r e are several accepted performance measures for t h e mobile server location problem. Indeed t h e mobile server location problem is best characterized as a multi-objective o p t i m i z a t i o n problem. However, if efficiency is most i m p o r t a n t t h e objective is t o minimize ^Cj. Unfortunately, there d o n ' t seem t o b e any known results in t h e j o b scheduling l i t e r a t u r e for this problem. T h e lack of analytical tools t o s t u d y this problem, coupled w i t h t h e absence of any results for determining a queue discipline t h a t minimizes t h e e x p e c t e d t i m e a customer spends in t h e system for an M/G/k queue, underscores t h e need for a simulation-based analysis. A Mobile-Server

Model

Variant

A simpler variant of t h e mobile-server model has been t h e object of some recent analysis. Given a metric space M, k servers t h a t can move a m o n g t h e points of M, a n d requests for service at points x G M generated randomly, determine t h e s t r a t e g y which minimizes t h e s u m of t h e distances traveled by each of t h e k servers t o service all requests. H o m e locations for t h e servers are not prescribed a n d queue delay (congestion) is not considered as p a r t of t h e performance objective. For each service request one of t h e k servers is selected a n d moves t o t h e location of t h e request. O t h e r s servers are also allowed 2t o move. Manasse et al, (1988) show t h a t if all requests are given offline algorithm solves t h e fc-server problem optimally. t h e n a n 0(kn ) Calderbank et al, (1985) provide a n analysis of two server selection rules for t h e 2server p r o b l e m w i t h t h e additional assumptions t h a t requests for service are g e n e r a t e d uniformly along either a line or a circle, t h e queue discipline is F I F O , a n d a request does not begin service until t h e previous request is completed. T h e two server selection rules tested were t h e partition rule in which each server is responsible for half of t h e line or circle a n d t h e nearer-server rule in which each request is served by t h e nearest server. Although congestion is not a n issue, these disciplines are analogous t o o u r districting a n d first-available disciplines, respectively. Calderbank et al. show t h a t t h e nearerserver rule is o p t i m a l for t h e circle a n d t h a t for t h e line it is b e t t e r t h a n t h e p a r t i t i o n rule (with regard t o expected cost) a n d within 1.69% of t h e o p t i m a l cost. Several a u t h o r s have employed a successful r a n d o m i z a t i o n scheme which avoids server selection rules completelv: when a request for service arises at a point x 6 M all

114

k servers move a non-negative distance after which x must be occupied by one of the servers. An amortized computational complexity analysis (see Tarjan, 1985) is typically used to analyze the performance of algorithms for this online fc-server problem. That is, an online strategy is termed c-competitive if the cost is not more than c times the optimal offline strategy. Manasse et al. show that no c-competitive strategy exists if c < k for any metric space with at least k + 1 points. In addition they provide a 2competitive algorithm for the 2-server problem and an (n — l)-competitive algorithm for the (n - 1) -server problem on n point metric spaces. Chrobak et al, (1990) present kcompetitive algorithms for fc-servers on a line. An extension of these results oy Chrobak and Larmore (1991) resulted in a ^-competitive algorithm for k servers on a tree. 5. CONCLUSIONS We have explored, via simulation, four different server disciplines for the mobile-server problem, using average response time as the metric of comparison. When the system is heavily congested, the shortest job completion discipline consistently outperforms the first available, districting, and look-ahead disciplines. The most commonly cited discipline, first available, is increasingly inferior to all three of the other disciplines as network congestion increases. The practical significance of these findings depends in part on how much information is available when service assignments are made. To use a districting approach, system designers must have prior information about the frequency with which calls will be made from the various locations. To use look-ahead, a dispatcher needs to know travel times to the requesting node. Finally, shortest job completion requires advanced k n o w l e d g e ^ travel times to and from the requesting node, plus on-site service times. The look-ahead and shortest job completion server disciplines may not be practical in some applications because they require future knowledge. However, even in those applications, the shortest job completion discipline is meaningful because it appears to provide an important theoretical bound on best possible average response performance. Some applications may have all these types of information available; for example, flexible manufacturing takes place in a highly controlled environment, where accurate predictions about travel and service times are quite realistic. Other applications will have less predictability; for example, emergency medical systems may have great difficulty in predicting on-site service times. Other possible applications will fall somewhere in between these extremes: ambulance services may have a fairly good idea of travel and on-site times for some times of day and types of calls, and have much less certainty at other times. An area of future research is a simulation study which explores the sensitivity of service discipline performance to uncertainty in knowledge about travel and service times. Shortest job completion will be optimal for small uncertainties, but we expect that districting may be more robust for large uncertainties. Although several different research communities have literature on problems related to the mobile-server problem, we have found no papers dealing exactly with this situation. But this problem has immediate, direct application to fife critical and economically important decisions. As this simulation shows, the potential benefits of looking beyond the simplest server discipline may have dramatic positive effects. 6. REFERENCES Baker, K.R. (1974), Introduction

to Sequencing

and Scheduling,

Wiley, New York.

Berman, O., R.C. Larson, and S. Chiu (1985), Optimal Server Location on a Network Operating as an M / G / l Queue, Operations Research, S3, 746-771. Berman, O., R.C. Larson, and C. Parkan (1987), The Stochastic Queue p-Median Problem, Transvortation Science* 2L 207-216.

115

Bruno, J.L., P.J. Downey, a n d G.N. Frederickson (1981), Sequencing Tasks with Exponential Service Times t o Minimize t h e E x p e c t e d Flowtime or Makespan, J. Assoc. Comput Mack., 28, 100-113. Calderbank, A.R., E . G . Coffman, Jr., a n d L. F l a t t o (1985), Sequencing Problems in Two-Server Systems, Mathematics of Operations Research, 10, 585-598. Carter, G.M., J . M . Chaiken, a n d E. Ignall (1972), Response Areas for T w o Emergency Units, Operations Research, 20, 571-594. Chiu, S., O. B e r m a n , a n d R . C . Larson (1985), Stochastic Queue Median on a Tree Network, Management Science, 17, 764-772. Chrobak, M., H. Karloff, T . Payne, and S. V i s h w a n a t h a n (1990), New Results on Server Problems, Proceedings of 19th Annual ACM Symposium on Discrete Algorithms, 2 9 1 300. Chrobak, M. a n d L.L. Larmore (1991), An O p t i m a l On-Line Algorithm for K-Servers on Trees, SIAM J. Comput, 20, 144-148. Conway, R . C , W . L . Maxwell, and L.W. Miller (1967), Theory of Scheduling, Wesley.

Addison-

Fontenot, M.L. (1989), Software Congestion, Mobile Servers, a n d t h e Hyperbolic Model, IEEE Transactions On Software Engineering, 15, 947-962. Fujiwara, O., T . Makjamroen, and K.K. G u p t a (1987), Ambulance Deployment Analysis: A Case S t u d y of Bangkok, European Journal of Operational Research, 31, 9-18. Gazmuri, P.G. (1985), Probabilistic Analysis of a Machine Scheduling P r o b l e m , ematics of Operations Research, 10, 328-339. Gross, C. a n d C M . Harris (1985), Fundamentals New York.

of Queueing

Handler, G.Y. a n d P.B. Mirchandani (1979), Locations bridge, M A , Kleinrock, L., (1975), Queueing

Systems,

Math-

Theory, 2nd ed., Wiley,

on Networks,

M I T Press, C a m -

volume 1, Wiley, New York.

Kincaid, R. a n d O. Maimon (1989), Material Handling Design in Flexible Manufacturing Systems: A Network Approach, s u b m i t t e d for publication. Kincaid, R., K. Miller and S. P a r k (1989), Locating P Mobile Servers on a Congested Network: A Simulation Analysis, In: Impact of Recent Computer Advances on Operations Research, (R. Sharda, B.L. Golden, E. Wasil, O. Balci, a n d W . Stewart eds.), 396-406. Kincaid, R., K. Miller a n d S. P a r k (1991), Simulation Analysis of Mobile-Servers on a Congested Network, American Journal of Mathematics and Management Science, t o appear. Klementiev, A.A. (1983), Models of Resource Allocation in Health Care Control P r o b lems, I n s t i t u t e of Control Problems, Moskva, 1983 (in Russian). Labetoulle, J., E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnooy K a n (1984), P r e e m p t i v e Scheduling of Uniform Machines Subject to Release Dates, In: Progress In Combinatorial Ovtimization, ( W . R . Pullevblank ed.). Academic Press, New York, 2 4 5 - 2 6 1 .

116

Larson, R . C . (1974), A Hypercube Model for Facility Location a n d Redistricting in U r b a n Emergency Services, Computers and Operatins Research, 1, 7 6 - 9 5 . Larson, R . C . a n d A.R. Odoni (1981), Urban Operations wood Cliffs, N . J .

Research,

Prentice Hall, Engle-

Larson, R . C . (1987), Travel-Time Analysis of New York City Police P a t r o l C a r s , faces, 17, 15-20.

Inter-

Lawler, E.L., J.K. Lenstra, A.H.G. Rinnooy K a n , D . B . Shmoys (1991), Sequencing and Scheduling: Algorithms a n d Complexity, t o a p p e a r In: Handbooks in Operations Research and Management Science, Vol. 4' Logistics of Production and Inventory. Lenstra, J.K., A.H.G. Rinnooy K a n , a n d P. Brucker (1977), Complexity of Machine Scheduling Problems, Annals of Discrete Math., 1, 343-362. Lubicz, M. a n d B . Mielczarek (1987), Simulation Modeling of Emergency Medical Services, European Journal of Operational Research, 29, 178-185. Manasse, M., L.A. McGeoch, a n d D. Sleator (1988), Competitive Algorithms for Online P r o b l e m s , Proceedings of 20th Annual ACM Symposium on Theory of Computing, 322-333. Pinedo, M. (1983), Stochastic Scheduling with Release D a t e s a n d D u e D a t e s , Research, 81, 559-572.

Operations

Smith, W . E . (1956), Various Optimizers for Single-State P r o d u c t i o n , Naval Logistics Quarterly, 3. Tarjan, R . E . (1985), Amortized C o m p u t a t i o n a l Complexity, SI AM J. Alg. Disc. 6, 306-318.

Research Meth.,

Uyeno, D.H., a n d C. Seeberg (1984), A Practical Methodology for A m b u l a n c e Location, Simulation, August, 79-87. ACKNOWLEDGEMENTS T h e a u t h o r s wish t o t h a n k Weizhen M a o for her contributions t o t h e discussion in Section 4 a n d to t h e College of William & M a r y Spring 91 discrete-event simulation class which provided independent verification of some of t h e results presented in Section 3. This p a p e r is based in p a r t on t h e 1990 u n d e r g r a d u a t e honor's thesis of S t e p h e n Harvey.

COLLISION D E P E N D E N T PERFORMANCE M O D E L FOR A STAR TOPOLOGY LOCAL AREA NETWORK

Gerrit K. JANSSENS

Department of Computer Science University of Antwerp (RUCA) Middelheimlaan 1, B-2020 Antwerp, BELGIUM

ABSTRACT Fiber optic local area networks are nowadays widely available for data transport. Different topologies have been proposed, such as a bus, a ring or a star. A star topology, called Hubnet, has been developed at the University of Toronto. It uses a medium access protocol in which collision are avoided, i.e. the throughput is an increasing function of the offered load. A model is developed in which the delay is investigated as a function of the throughput. The model is compared with a regenerative stochastic analysis and is validated by means of a simulation model and real-life measurements.

KEYWORDS Local area network; stochastic modelling; simulation.

INTRODUCTION Random access local area networks with a star topology and packet broadcasting have been introduced by RAWSON and M E T C A L F E (1978). The stations are connected by means of full-duplex optical fiber links. The access protocol used in their network, called Fibernet, is the same as the Ethernet protocol. Soon other protocols were developed for star networks. Among these are the Hubnet protocol, developed at the University of Toronto (LEE and BOULTON (1983)), one developed at the AT & T Bell Labs (ALBANESE (1983)) and one developed at the IBM Research Labs in Zurich (CLOSS and LEE (1981)). The Hubnet and AT & T protocols are collision free. The IBM protocol has a one-bit collision window. 117

118

LEE and BOULTON (1983) use some quantitative methods in order to forecast saturation effects, as a function of a number of parameters (number of stations, packet length and retry time). ALBANESE (1983) constructed a very naive model for his network: it has great limitations, both of conceptual as of validation nature. For Hubnet a model has been published by HAMACHER and LOUCKX (1985). Real-life measurements have been reported by LEE, BOULTON and THOMSON (1988) and by KAMAL and HAMACHER (1986). A discrete event simulation model for Hubnet is reported by JANSSENS (1987). In the next section we present another model for the collision-free access protocol for star topologies. It will be based on the major criticism which can be raised against the ALBANESE and HAMACHER-LOUCKX models. The model will be presented in three different forms, with increasing complexity. It will be compared both with the ALBANESE and HAMACHER-LOUCKX models as with simulation results.

THE HUBNET PROTOCOL

The components of a physical Hubnet are a central hub and network access controllers (NAC's). Physically Hubnet can be a rooted tree, due to technical limitations of optical fiber as a communications medium. The existence of these intermediate hubs, called subhubs, however, does not influence the protocol mechanism. Logically this tree topology are two matching trees: a selection tree and a broadcast tree. From now on, we only speak of Hubnet in its star topology. Each NAC has a transmit and a receive buffer. When a NAC receives a packet in its buffer, it is transmitted towards the selection side of the central hub. In a quiescent network, this central hub selects the frame and broadcasts it over the star. Once the packet is received by all other NAC's, including the source NAC, the source NAC can accept the following packet from its host. While a packet is in the central hub, it cannot interfere with other packets. Only one can be selected by the hub. So a different situation appears if the network is not quiescent. In a non-quiescent network it can happen that a NAC transmits a packet which encounters the hub which has already selected a packet from another NAC.

FORMULATION OF THE COLLISION DEPENDENT MODEL

Definitions

and

assumptions

In the model the following definitions will be used: N P

In the model the following definitions will be u In the model the following definitions

119

T r

= =

Pnew

(k) G

1 (x)=

Pi

=

Tf(x) = W (x) =

n

q*

=

the constant packet transmission time, the=retry time. In high-bitrate transmission systems r < T, the probability that, after a successfull transmission, a packet, being in its retry (i.e. waiting for its echo) will find the hub occupied at its next attempt, the distribution function of the first-order statistic of a sample of size k out of a U(0,r) distribution, g ^ x ) is its density, the probability that at the end of the next successfull transmission i packets are in retry (i = 0 , 1 , . . , N - l ) , Pr {length of the free period of the hub < x } , x > 0, Pr {time between the end of a transmission, at which no packets are left in retry, and the arrival of a new packet < x } , x > 0 Pr {the number in a M / D / l / N - system = k}, (k = 0 , 1 , . . , N ) .

Assumptions: (1) time is slotted. The length of the time slot is the retry time. The length of the packet, expressed in time units, is an integer multiple of the slot time. (2) new packets arrive according to a Poisson process. By this, W follows an exponential n distribution with parameter X,,. (3) the number of packets in retry at the end of a successful transmission follows the distribution of the number in an M / D / l / N - system, on condition that the server is occupied.

Model

description

In this paragraph we develop three different models with increasing complexity. The models rely on different assumptions concerning the length of the 'free' period of the hub, T . They are:

f

Model 1: the free period is infinitely large, i.e. a packet coming out of a series of subsequent collisions during a transmission (hub is busy) is successfully transmitted at its next retrial, Model 2: the length of the free period is a random variable following an exponential distribution with mean (1 - p)T I p . Model 3: the length of the free period is random variable following the distribution of the first order statistic of a sample of size k from a U(0,r) - distribution. This assumes that after this period the hub is selected by a packet coming out of its retry period and not by a new packet.

120

In model 1 we further have to assume that the packet transmission time T is an integer number times the retry time r. This allows us to say that each transmission lasts for T/r time slots. The probability that a packet meets a busy hub is p. The slot sequence number of the busy transmission that packet meets is uniformly distributed between 1 and T/r. If this packet meets the busy hub while transmitting the first packet of its message, the newly arriving packet will encounter T / T subsequent collisions. If it meets the last packet it will only encounter one collision. With the assumption of an infintely large free period, the mean number of collisions is:

Cm as in Proposition 2. For every vertex v of G, let / v stand for the set { i I veCi}. By Proposition 2, / v is an interval. Hence {/v } v Veis an interval representation for G . W e shall refer to this interval representation as max-clique. The following result shows that interval graphs are characterized by a special ordering of their vertices. This result will be used repeatedly in the remainder of this work. P r o p o s i t i o n 4. (Olariu 1991) A graph G = ( V , £ ) is an interval graph if, and only if, there exists a linear order « on V such that for every choice of vertices u, v , w u « v « w , ww e £ implies wv € E. Let G=(V\E) be an interval graph and let { / v= [ a vA J J v e V representation of G. Define a linear order « on V by setting

(1)

6 0 arbitrary m

interval

131

u

«

v whenever (a

u
eqn : = 1 = sum( b i n o m i a l ( n - 1 , s ) * > ( 1 - p ) ( ( n - s - 1 ) * ( s + 1 ) ) * F(p, s + 1 ) , s = 0 . . n - 1 ) ; > eqn : = i s o l a t e ( e q n , F ( p , n ) ) ; > s o l := eval(subs(F=procname, r h s ( e q n ) ) ) ; > end:

The coding occurs at a very high level of abstraction, relying on the system's basic capability to compute sums symbolically, or to isolate selected terms of equations. T o compute the reliability of a complete graph on 8 vertices we invoke this routine interactively as: > RelK(p, 8 ) :

X1 15 11 - q - 6 Iq p l- 2lq

1 Where A

5=

1 -q

-2

l6

- 7 q ( 1 - q ~ 6 q % - 1 5 q A-20

2

9 B = 1%-q q p,

3

1 5

1 2

AU - 35? B l2 - 35q C l0 - 2\q

4

qB

- 1 5 q C-6

- 35q p - 3 q*A , C = 1 -q

4

D

6 q D)

6

- 4 q% -6q A

A - 4 q B and

= 1 - q - 5 q*p - 10 q A - 10 q B - 5 q C , while the following sequence of standard M A P L E commands provides some basic information concerning the polynomial's structure.

D

> rp >

:= e x p a n d ( r p ) :

degree(rp p);

f

28

162

> max( coeffs(rp,p)); 9345271992

>length("); 10

For another computation, it is necessary to calculate the size of a largest possible clique in any graph on n vertices and m edges. This admits an easy solution in MAPLE. When n = 12 and m = 3 0 , we have: > m := 30: n := 12: > f := b i n o m i a l ( k , 2 ) + n - k - m;

(;)-"-'

> t := s o l v e ( f = 0 , k ) ;

> trunc(evalf(max(t)));

2.2.

A Flexible

Data

Structure

7

for

Multigraphs

The data required to represent a given graph is stored in a table (the remember table) that is automatically associated with a M A P L E procedure. Remember tables are special instances of MAPLE'S hash tables. In such tables, arbitrary M A P L E objects are associated with arbitrary keys using a notation that resembles array indexing. For example, we have > B [ 3 , t ] := x^2 +A 5: > B[x + y] := ( x 3 + 7 = 4 ) : > eval(B);

t a b l2 e( [ (3, t ) = 3x + 5 , x + y = (x + 7 = 4) ] )

W e make use of symmetric tables, sparse tables, lists and sets, all of which are primitive data structures in MAPLE. The following short interactive session illustrates the style of interaction and some of the data structures. W e first create a multigraph called G with two edges and two vertices. > new(G): > addvertices(1,2,G): > addedge(Cycle(1,2),G): The following command adds a directed edge. > addedge(

[1,2],G):

163 The actual representation of the graph is revealed by the following command. > show(G) ; table([ _Edges = {el, e2, e3} _EdgeIndex = table(symmetric, [ (1, 2) = _Head = table([ e3 = 2 ]) _Ends = table([ el = {1, 2} e2 = {1, 2} _Eweight = table([ el = 1 e2 = 1 e3 = 1 _Neighbors = table([ 1 = {2} 2 = {1} ]) _Status = {MULTIGRAPH, DIRECTED) _Tail = table([ e3 = 1 ]) _Vertices = _Vweight = table(sparse, []) _Econnectivity = _Econnectivity ])

{el, e2, e3} ]) e3 = {1, 2} ]) ])

{1, 2}

This command displays the actual contents of the remember table associated with the procedure, G . Many of the remember table entries are themselves tables or sets (represented by the use of {}). Interactive commands are used to extract or modify information from this main remember table. > P := petersen(): > neighbors(3,P); {2,4,10} > edges(P) ;

{e

lf e 2/ e 3, e 4, e 5, e 6, e 7, e 8, e 9, e 1 , 0

i i '

i 2 '

e

e

i 3 '

e

i 4 '

e

i 5 ^

> connectivity(P); 3

Data retrieval is accomplished by invoking the procedure with appropriate arguments (keys). If the key is found, the corresponding table entry is returned immediately. In all other cases, the code of the procedure is executed and provides the initial values. At the user level, the various operations on graphs are M A P L E procedures which reference or update the remember table of the procedure representing a graph as required. The graph data structure is user extendable. For example, a new property can be added by assigning it as the value as in: > P(My_Property)

:= "value of my new property":

Tools are also provided for automatically updating new properties as a result of edge deletion or contractions. 3.

SOME EXAMPLES

In order to illustrate the diversity of this network environment we consider in detail some of the computations that must be completed when investigating coefficient bounds on reliability polynomials. W e basically use bounding techniques as outlined in Brown et al. (1990).

e

164 3.1. Changing

Representations

The reliability polynomial of a network can appear in several different forms. Let iV; be the number of operational subgraphs of G with ί edges. Then m

m

i

RelP(G,p) = X i V ip ( 7 - ρ Γ =

Σ^Ρ^'

i=0

V ~ΡΫ

(2)

i=0

Such polynomials are coded directly. For example, the F-form is > RelF := proc(F,p,m)Λ local Λ i; > sum(F[i]*(1-p) ί*ρ (m-i), i=0..m); > end:

Let Fi be the number of graphs obtained from G by removing i edges, leaving G connected, and let c be the edge connectivity of G . Then

1

1

F

Fm-n

-=

0 for »' = 0...n-2

N„-i

-=

the number, of spanning trees

N

m-i

+1—

t

Fi

=

Nm-i

-=

(")for

Fc

=

N -c

---

M-N(CKK)

m

i < c

(3)

where N(cuts) is the number of minimum cardinality edge cuts of G.

model := proc(f,h,m,n,resulttype) local F,H,a,b,p,t,eqns,result,g,c; # # #

Generate the polynomials and express their difference in powers of ρ (using generic coefficients in terms of a and b)

F := RelF(a,ρ,m); t := collect(H-F,p);

H := RelH(b,p,m,η);

# Set up the equations forcing the coefficients of ρ to be 0 eqns := solve({coeffs(t,p)}); # Keep only those equations involving the H coefficients, eqns := nontrivial(eqns); eqns := select(has,eqns,b); # Discover which coefficients are present and solve for them, c := select(has,indets(eqns),b); result := solve(eqns,c); result := nontrivial(result); RETURN( subs( b=h, a=f, result) ) ; end:

Fig. 1. A MAPLE procedure for deriving H-coefficients

There are other useful combinatorial interpretations of the same polynomial. For example, let

165 Hi be the number of intervals in the simplicial complex (hereditary family) formed by the subgraphs obtained by edge deletion, and whose lower set has size i (see Brown etal. (1990), Colbourn (1987)). Then the //-form of the polynomial is given by

n l m-n + 1 RelH(G,p)

= p~

£ ^ ;=o

(4)

This //-form satisfies the combinatorial identity: m-n + 1 //, = the number of spanning trees

(5)

Knowledge of the structure of shellable complexes gives rise to the best available coefficient bounds, due to Ball and Provan (1982,1983). The first step in analysing a particular graph requires that we compare the two representations. Converting from one form to another is a crucial task. The M A P L E procedure shown in Fig. 1 re-expresses the //-coefficients in terms of the F-coefficients. For a graph with 10 edges and 5 vertices we obtain > model(F,H,10,5);

H

=

»i H 2 H 3 H

= = =

0

4 H 5 H 6

= = =

15F -5F F 0 1 +2 -20F + 10Fi-4F + F 0 2 3 15 F - 1 0 F + 6 F - 3 F + F 0 1 2 3 4 -6F +5F -4F +3F -2F + F 0 x 2 3 4 5 + F -F +F F -F +F -F

0l 23

45 6

The resulting set of equations is in a form which can be used immediately to assign values in expressions. For example, the M A P L E command > subs( " ,

H[4] + H[5]);

replaces the //-coefficients H and H 4 5 by the right hand sides of the appropriate equations from the preceding computation and so is an expression in terms of the F-coefficients. 3.2.

Graph-Theoretic

Parameters

Through knowledge of the number of edge cuts of various sizes, and the connectivity, we can determine some coefficients exactly. The procedure in figure Fig. 2 makes use of such information and the above polynomial identities to derive exact values for the first few coefficients of the reliability polynomial. Clearly, both the algebraic and the graph theoretic capabilities of the system are involved. For example, the commands count cuts () and connectivity () compute the number of minimum cardinality edge cuts and the connectivity of the graph.

166 modelH : = proc(f,h,G) local s F,H,m,n,c,cuts,eqns,i; userinfo(l,{debug}, enter*); m := nops(edges(G)); n := nops(vertices(G)); c := connectivity(G); cuts 1:= countcuts(G); eqns := model(F,H,m,n,•Hform ); # compute known values for F coefficients for i from 0 to c do F[i] := binomial(m,i); od; F[c] := F[c] - cuts; # The higher order F coefficients are all known to be 0 # as can be deduced from the polynomial identity for i from m-n+2 to m do F[i] := 0 od; RETURN( subs({H=h,F=f},eval(eqns))); end: Fig.

2.

Deriving the Known

Coefficients

For the ARPA example from section 4.3 we obtain > countcuts(G); 57

> connectivity(G); 2 so that applying modeiHO

to this graph we obtain:

> modelH(G); H

=

1

H

=

58

Q x H 2 H 3 H A H 5 H 6 H n H % H 9 H lQ H! x H x2 H l3

=

1654

=

-22308 + F

3

=

1 1 8 6 3 5 - 1 0 F3 + F4

=

-366762 + 4 5 F - 9 F

= = = = = = = =

4+ F 5 4 5 6 -1057848 + 2 1 0 F - 8 4 F + 2 8 F - 7 F + F 3 4 5 6 7 1 0 6 6 7 9 1 - 2 5 2 F + 126 F - 56 F + 2 1 F - 6 F + F 3 4 5 6 7 8 -766810+210F - 126F +70F -35F +15Ft-5F + F 3 4 5 6 8 9 385286 - 1 2 0 F + 8 4 F - 5 6 F + 3 5 F - 2 0 F + 1 0 F - 4 F + F 3 4 5 6 7 8 9 10 - 1 2 8 9 3 2 + 4 5 F - 3 6 F + 2 8 F - 2 1 F + 1 5 F - 10F + 6 F - 3 F + F 3 4 5 6 7 8 9 10 x x 25869 - 1 0 F + 9 F - 8 F + 7 F - 6 F + 5 F - 4 F + 3 F - 2 F + F 3 4 5 6 7 8 9 l 0 u l2 -2358+F -F +F -F +F -F +F -F +F -F 3 4 5 6 7 8 9 1 01 112 + F 13 3 3

746724-120F + 3 6 F - 8 F + F

Parameters such as edge connectivity and the number of spanning trees are recomputed only if necessary. This is particularly important in interactive use as it is impossible to anticipate the history of the session and recomputation is too expensive. This is accomplished simply by inserting an appropriate entry into option remember table.

167

4.

EXACT COMPUTATIONS

The basic bounding techniques rely on being able to compute the reliability polynomial exactly for certain types of graphs. W e have already seen how a recurrence could be used to compute the reliability polynomial, R e l K ( p , n ) , for a complete graph. W e isolated the term of interest in an identity and then applied the process recursively. The same technique generalizes to a second class of graphs derived from a complete graph. For this class of graphs the recurrence for the exact polynomial involves summations nested three deep.

l-2XI(^M^> '•'• 0

^

•*^

a = 0b = 0c = 0 where f(a,r b,c) = (s+t-l-a-b-c)(a + r) + (s-\-a-b) b + {d-\-a)c. y M A P L E in exactly the same way as for the complete graph.

rAe,

This can be solved in

RelKext := proc(p,n,r,d,nd) local R,u,t,s,eqn,eqn2; option remember; eqn := 1 = sum ( sum ( sum ( binomial(d-1,s)*binomial(n-d, t)*binomial(nd, u) A * R(p,s+t+l,r,s+l,u) * (1-p) A 0 such that (a) < Kq>(a) for all a > 0. Function 97 (a) gives the overall rate at which / diverges from / as x' moves away from x. In particular, for the production scheduling problem (without inventory plots), with an optimal start, delaying an operation by a , the rate of divergence of the simple heuristic is O(oc). #

*

These measures, 8f(a), 5f(a), Ay(a), and 9f {x), provide some indication of the quality of the heuristic compared to the optimal. The first provides a traditional measure of solution quality; the remainder are specifically targeted to the sensitivity or stability of the solutions produced by a particular algorithm. 4.3 Regions

oflnvariance

Frequently in animated sensitivity analysis, difficult work only occurs at discrete points in time. For example, in animating the linear programming constraints, when a simplex pivot operation is required, additional work is necessary. In the production scheduling example, when a new operation must start moving, additional work is required. In the travelling salesman case, it is the subsets of X in which no important events can occur that are of interest. In between these "important" events, the amount of work required to sustain the animation is relatively small. We now characterize these notions more precisely. In particular, f ( ) can be written in the form, fix) = h(x,p(x)), where p(x) will be more difficult to compute than h, once p(x): X -> P has been computed. For example, for a travelling

192 salesman problem, w = p(x) is the optimal sequence of cities, and h(x,w) distance travelled using sequence w.

calculates the

W e define Rp to be a region of invariance for/(*) = h(x p(x)) if for all x e Rp , p(x) = k for

9

some constant k e P. Furthermore, Rp (x) is defined to be the region of invariance containing x e X. As an example, for a Euclidean traveling salesman problem, where a city is being moved, a region of invariance, Rp consists of points where the sequence of cities remains unchanged. W e make no assumptions about the shape of the region of invariance. Rather, the exact properties of Rp (x) will often be of particular interest. For example, is Rp (x) convex or connected? The concept of a region of invariance is important, because i f p is difficult to compute compared to h, and Rp (x) is known, then when the user changes the input from x to if determining that x. e Rp (JC), is relatively easy then determining f(x) will be relatively easy. One need merely calculate f(jt).= h(x.,p(x)) Of course, Rp (x) must be computed for this to be effective. This characterization, however, now presents opportunities for at least three different forms of approximation: 1.

W h e n / ( x ) can be computed as/(jc) = h(x,p(x)) then f(x) can often be computed as f / GO = h(x,p (x)), where p (x) provides an approximation to p(x). Note that we only use an approximation to p, not h. Using p instead of p for a given x produces a region of invariance, Rf(x) which will usually not equal Rp (x) and may have other interesting properties (e.g., stability) as was shown with the Euclidean TSP.

2.

It may not be an easy matter to calculate Rp (x) for particular functions p. In such a case, it might be useful to be able to compute an approximate region of invariance, denoted R ix). Of course, one would like the difference between R (x) and Rp (x) p p to be as small as possible. This represents an uncharted area of research. Of course, one simple way to generate such an approximate region would use an approximate algorithm, e.g., R

p (x)=R?(x)

Clearly, however, given R

determine whether or not a given x. e R

p

p (x),

and an algorithm to

(JC), it would be possible to develop a

corresponding approximation algorithm p (x). 3.

Another way to generate R (x) would consider only a finite subset P' of P . For p example, in the Euclidean TSP, one could maintain a list of all permutations generated by an approximation algorithm as a city is moved. For any given location, one would examine all such permutations and choose the best. This approach can be shown to produce the same type of well-behaved regions of invariance that are produced by OPTIMAL for the TSP.

The above framework has attempted to make more precise some of the possible variations possible when one begins to explore animated sensitivity analysis. These questions go beyond the traditional notions of solution quality based on the nearness of the objective function to the optimal. 5. CONCLUSION AND RESEARCH DIRECTIONS W e have developed the notion of animated sensitivity analysis, which provides animations of solutions in response to parameter changes. Animated sensitivity analysis provides useful, practical insight on the behavior of solutions to complicated problems as well as insight on the

193 nature of algorithms used to solve problems. The research questions are not restricted to the development of faster algorithms with better lower bounds, but to notions of solution stability, and solution adjustment as the data changes. These include: •

Fast algorithms for computing regions of invariance, Rp (x). This seems rather challenging since Rp (x) depends onf(x) and computing j(x) is often NP-complete. For a one-dimensional, linear parameter change to a linear objective function, the algorithm of Eisner and Severance (1976) works well. Similarly, for a twodimensional linear parameter change to a linear objective function, the algorithm of Fernandez-Baca and Srinavasan (1989) works well. Note however, that for the Euclidean TSP, the parameter change is non-linear.



Fast algorithms for computing Rp\x) as well as analysis comparing Rp* (x) to Rp (x). For the Euclidean travelling salesman problem, the regions for approximation algorithms are far messier than for an optimal algorithm.



Determining approximate regions of invariance, R (x).

p



Environments for developing animated sensitivity analysis similar to those available for performing algorithm animation. Perhaps algorithm animation systems suffice for animated sensitivity analysis, however, the faster computational speeds required for animated sensitivity analysis may require specialized environments.



Empirical studies of the effectiveness of animated sensitivity analysis and their appropriateness for particular problems and individuals.

W e believe that animated sensitivity analysis not only provides a useful mechanism for helping to understand the behavior of problems, it also promises to be a rich research area for traditional researchers in OR/MS. REFERENCES Bell, P. C , A. A. Taseen, and P. F. Kirkpatrick (1990). Visual Interactive Simulation Modeling in a Decision Support Role. Computers and Operations Research, 7 7 : 5 , 447-456. Bell, P. C. (1989). Stochastic Visual Interactive Simulation Models. J. Opl Res. Soc, 40:7, 615-624. Bell, P. C. (1986). Visual interactive modeling in 1986. In: Recent Developments in Operational Research, (V. Belton and R. O'Keefe, eds.), pp. 1-12. Pergamon Press, Oxford. Bell, P. C , D. C. Parker and P. Kirkpatrick (1984). Visual Interactive Problem Solving—A New Look at Management Problems. Business Quarterly, 49,15-18. Bentley, J. L. and B. W. Kernighan (1987). A System for Algorithm Animation: Tutorial and User Manual. Computer Science Technical Report No. 132. A T & T Bell Laboratories, Murray Hill, NJ. Boyd, S. C , Pulleyblank, W. R. and G. Cornuejols (1987). Travel—An Interactive Travelling Salesman Problem Package for the IBM Personal Computer. OR Letters, 6:3, 141-143. Brooks, F. P., M. Ouh-Young, J. J. Batter and P. J. Kilpatrick (1990). Project G R O P E — Haptic Displays for Scientific Visualization. Computer Graphics, 24:4,177-185. Brown, M. H. and R. Sedgwick (1984). A system for algorithm animation. Computer Graphics, Vol. 18, No. 3, pp. 177-186.

194 Buchanan, J. T. and K. I. M. McKinnon (1987). An Animated Interactive Modeling System for Decision Support. Operational Research '87, 111-118. Elsevier Science Publishers, Amsterdam. Card, S., T. P. Moran and A. Newell (1983). The Psychology of Human-Computer Interaction. Lawrence Erlbaum Associates. Clark, W. (1957). The Gantt Chart, (Third ed.). Pitman and Sons, London. Conway, R., W. L. Maxwell and R. Miller (1967). Theory of Scheduling. Addison-Wesley, Reading, Massachusetts. Conway, R. and W. L. Maxwell (1986). Low-level interactive scheduling. Symposium on Real-Time Optimization for Automated Manufacturing Facilities. National Bureau of Standards, Washington, D. C. Conway, R., W . L. Maxwell and S. L. Worona (1986). User's Guide to XCELL Factory Modeling System. Scientific Press, Palo Alto, CA. Eisner, M. and D. Severance (1976). Mathematical Techniques for Efficient Record Segmentation in Large Shared Databases. / . of the ACM, 23, 619-635. Fernandez-Baca, D. and S. Srinavasan (1989). Constructing the Minimization Diagram of a Two-Parameter Problem. Technical Report, Department of Computer Science. Iowa State University, Ames, Iowa. Fisher, M. L. (1986). Interactive Optimization. Annals of Operations Research, Vol. 5, pp. 541-556. Gantt, H. L. (1919). Organizing for Work. Harcourt, Brace and Howe, New York. Garman, M. (1970). Solving Combinatorial Decision Problems via Interactive Computer Graphics, with Applications to Job-Shop Scheduling. Unpublished Ph.D dissertation. Carnegie-Mellon University, Pittsburgh, Pennsylvania. Gay, D. M. (1987). Pictures of Karmarkar's Linear Programing Algorithm. Computer Science Technical Report No. 136. A T & T Bell Laboratories, Murray Hill, New Jersey. Geoffrion, A. M. (1976). The Purpose of Mathematical Programming is Insight, Not Numbers. Interfaces, 7:1, 81-92. Godin, V. (1978). Interactive scheduling—historical survey and state of the art. AIIE Transactions, Vol. 10, pp. 331-337. Grant, J. W. and S. A. Weiner (1986). Factors to consider in choosing a graphically animated simulation system. Industrial Engineering, Vol. 18, pp. 37ff. Graves, S. C. (1981). A review of production scheduling. Operations Research, Vol. 29, pp. 646-675. Hurrion, R. D. and R. J. R. Seeker (1978). Visual Interactive Simulation An Aid to Decision Making. Omega, 6, 419-426. Hurrion, R. D. (1978). An Investigation of Visual Interactive Simulation Methods Using the Job-Shop Scheduling Problem. / . OplRes. Soc, 29, 1085-1093. Hurrion, R. D. (1980). Visual Interactive (Computer) Solutions for the Travelling Salesman Problem. / . Opl. Res. Soc, 31, 537-539. Hurrion, R. D . (1986). Visual Interactive Modelling. European Journal of Operations Research, 23, 281-287. Jackson, P. L., J. A. Muckstadt and C. V. Jones (1989). COSMOS: A Framework for a Computer-Aided Logistics System. / . Mfg. Oper. Mgt., 2, 222-248. Jones, C. V. and W. L. Maxwell (1986). A system for scheduling with Interactive computer graphics. HE Transactions, Vol. 18, pp. 298-303. Jones, C. V. (1988a). The three-dimensional gantt chart. Operations Research, 36:6, 891903. Jones, C. V. (1988b). Animated Sensitivity Analysis for Production Planning. Proceedings of the 1988 International Conference on Computer Integrated Manufacturing, pp. 171180. IEEE, Los Angeles, CA. Jones, C. V. (1989). The Stability of Algorithms for the Euclidean Traveling Salesman Problem. Working paper, Faculty of Business Administration, Simon Fraser University. Jones, C. V., User interfaces. In: Handbook of Operations Research (J. K. Lenstra, A. Rinnooy Kan, eds.), forthcoming.

195 Kirkpatrick, P. and P. C. Bell (1989). Visual Interactive Modelling in Industry: Results from a Survey of Visual Interactive Model Builders. Interfaces, 19:5,71-79. Korhonen, P. and J. Laakso (1986a). Solving Generalized Goal Programming Problems Using a Visual Interactive Approach. European Journal of Operational Research, 26, 355-363. Korhonen, P. and J. Laakso (1986b). A Visual Interactive Method for Solving the Multiple Criteria Problem. European Journal of Operational Research, 24,227-287. Lawler, E. L., J. K. Lenstra and A. H. G. Rinnooy Kan (1982). Recent developments in deterministic sequencing and scheduling: a survey. In: Deterministic and Stochastic Scheduling (M. A. H. Dempster, J. K. Lenstra, and A. H. G. Rinnooy Kan ,eds.), pp. 35-74. Kluwer Boston Inc., Hingham, Massachusetts. Lawler, E. L., J. K. Lenstra, A. H. G. Rinnooy Kan and D . B . Shmoys (1985). The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, New York. Lemberski, M. R., and U. H. Chi (1984). 'Decision Simulators' Speed Implementation and Improve Operations. Interfaces, 14:4,1-15. Libura, M. (1988). Sensitivity Analysis for Shortest Hamiltonian Path and Traveling Salesman Problems. Management Science Research Report N o . M S R R 540, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, PA. Lustig, I. (1989). Applications of Interactive Computer Graphics to Linear Programming. Proceedings of the Conference on Impact of Recent Computer Advances in Operations Research, January 4-6, Williamsburg, Virginia. McCormick, B . M., T. A. DeFanti and M. D. Brown (1987). Visualization in Scientific Computing. Computer Graphics, 21:6, entire issue. MIMIIS User's Manual (1991). Chesapeake Decision Sciences Inc., New Providence, NJ. Myers, B. A. (1986). Visual Programming, Programming by Example and Program Visualization; A Taxonomy. Proceedings SIGCHI86: Human Factors in Computing Systems, April 13-17, Boston, MA. Palmiter, S., J. Elkerton and P. Baggett (1989). Animated Demonstrations versus Written Instructions for Learning Procedure Tasks. Technical Report C4E-ONR-2, Center for Ergonomics, Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor. Schedule X. Numetrix Ltd, Toronto, Ontario. Suri, R., (1983). Infinitesimal perturbation analysis of discrete event dynamic systems: a general theory. Proceedings of the 22 nd IEEE Conference on Decision and Control, pp. 1030-1039. San Antonio, Texas. Wagelmans, A. (1990). Sensitivity Analysis in Combinatorial Optimization. Unpublished Ph.D dissertation, Erasmus University, Rotterdam, The Netherlands. Woerlee, A. P. (1991). Decision Support Systems for Production Scheduling. Unpublished Ph.D dissertation, Erasmus University, Rotterdam, The Netherlands. Zemel, E. (1989). LP Model Manual. J. L. Kellog Graduate School of Management, Northwestern University.

This page intentionally left blank

EDINET - A NETWORK EDITOR F O R TRANSSHIPMENT PROBLEMS W I T H FACILITY LOCATION

12

2

2

W . OGRYCZAK - , K. STUDZINSKI and K. Z O R Y C H T A Marshall University, Computer & Information Sciences, Huntington, W V 25755, USA Warsaw University, Faculty of Mathematics & Computer Science, 00-913 Warsaw, Poland

2

ABSTRACT This paper presents a network editor designed and implemented to provide an interactive optimization system for multiple criteria transshipment problems with a friendly tool for problem modification and solution examination. The essential data for a network problem can be divided into two groups: logical data defining a structure of the network (nodes and arcs), and numerical data describing characteristics of the nodes and arcs of the network (e.g., capacities, coefficients of the objective functions). The general concept of the presented network editor is to edit the numerical data with a window mechanism while examining or defining the logical structure of the network, i.e. while moving along the network.

KEY WORDS User Interface; Networks; Transportation; Location; Linear Programming.

INTRODUCTION Real-life decision problems are usually so complex they cannot be modeled and solved as a one-shot mathematical programming problem with a single set of parameters. Thus there is increasing interest in interactive systems where optimization techniques are used repetitively on a varying mathematical model while a decision maker (DM) is learning the decision problem. The progress in computer technology during the past decade allows us to consider as implementable even very complex interactive procedures requiring solutions to sequences of mathematical programming problems. Powerful microcomputers have become standard productivity tools for businessmen and other D M ' s . This has permitted the development of interactive decision support systems where mathematical programming models and methods play only a supporting role in the decision analysis performed by a human D M (Lewandowski and Wierzbicki, 1989). 197

198 A user interface has been always an important part of mathematical programming implementations. However, for many years it has been directed primarily towards an analyst as a user. The classic mathematical programming packages, like M P S X (IBM, 1971a) and similar, assumed a linear program to be an algebraic problem, i.e., a problem defined by a matrix of numerical coefficients. They took into account only one modeling problem: how to input a large-scale matrix into the package. It has to b e admitted, this problem was solved in an excellent way by introducing commonly accepted industrial standard of the MPS-file. It was later supported by so-called matrix generators (e.g. IBM, 1971) which helped an analyst to input the entire matrix by defining its block structure. Some packages opened gates to their internal data structures for high-qualified analysts by acceptance of general purpose programming languages as their control languages (IBM, 1978). Nevertheless, in the mid 80's, when the microcomputer revolution exploded with enormous numbers of software tools addressed directly to decision makers, L P systems still focused on the algebraic model (compare Sharda and Somarajan, 1986). The lack of corresponding progress in user interfaces became a real bottleneck for wide spread of mathematical programming software in the wave of the microcomputer revolution. The professional user interfaces were, obviously, unacceptable for implementations aspiring to interact directly with a D M . Two the most popular and the most friendly ways offered at that time to define a problem in P C L P systems could be summarized as follows: "write all the equations in a natural algebraic form like in your OR textbook" or "form the matrix of coefficients and insert it into a spreadsheet". Finally, in the past several years the concept of modeling languages (Bisshop and Meeraus, 1982; Fourer, 1983; Fourer et al, 1987) has materialized with portable P C systems like GAMS (Brooke et al., 1988) or M P L (Maximal Software, 1990) becoming available. However, they are still mainly text-based modeling languages which require the model to be manually transformed into algebraic form. The system then transforms the algebraic form into internal data structures. Recently there has been developed (experimental) software which employs artificial intelligence and database concepts to transform a model (specified with a combination of text and graphics) into the solver data structures (Murphy and Stohr, 1986). Among mathematical programming problems there is a large family of problems for which any algebraic formulation is against their nature, the so-called network problems. The matrix formulation is perhaps the most inefficient way to represent a network (graph) structure. Moreover, solution procedures like the simplex method applied to network problems use a graph (tree) representation to implement even a typical algebraic object as the basis inverse (Grigoriadis, 1986). So, it seems to be unreasonable to force a user to transform a network into algebraic form only to satisfy an assumed user interface standard. Instead some specialized network interface based on the graph methodology would be preferable. Like, for instance, the extensions that have been made to A M P L (Fourer et al., 1987) to model nodes (balance equations) and edges (flow variables) explicitly. The essential data for a network problem can be divided into two groups: logical data defining the structure of the network (nodes and arcs), and numerical data describing characteristics of the nodes and arcs of the network (e.g., capacities, coefficients of the objective functions). In other words, the network problem can be readily defined in terms of attributed graphs where a graph itself represents the problem structure and several groups of numerical data are considered as attributes to the graph objects (nodes and arcs). The

199 power of attributed graphs as a mathematical representation (as well as communication) medium for many MS/OR models was recognized by Jones (1990) among others. While designing an attribute graph editor there is usually temptation to provide a graphic interface for graph structure definition. Such a method is commonly used in the so-called CASE (Computer Aid for Systems Engineering) software (e.g. Crow, 1990) to define data flow diagrams, entity-relationship diagrams and other graphs for systems analysis and design. A user is then free to move across a large sheet (scrolled and zoomed on the screen) and locate several objects using a predefined set of icons. A functional description of such a full graphic interface for OR network modeling has been given by Steiger et al. (1990). On the other hand, many network models actually deal with abstract networks expressing certain relations. For such a model there does not exist a natural layout of the network whereas usage of the full graphic interface forces the user to design such a layout. So, in some cases this form of user interface can cause additional work for the analyst. An approach which seems to b e more convenient for defining abstract networks depends on the definition of network structure as a relation between its nodes. For simple network models one can create a node set and then define arcs as crisscross products on the node sets. Such an approach with usage of a friendly window mechanism was implemented by McBride (1988). This paper presents the network editor EDINET designed and implemented as a part of the DINAS system (Ogryczak et al., 1988, 1991, 1992), an interactive P C optimization system for multiple criteria transshipment problems with facility location. Due to multiple objective functions DINAS cannot simply solve the problem. It is rather a decision support system that makes the D M selecting the best solution while learning the problem during interactive work with the system. According to some D M ' s requirements DINAS generates several efficient (Pareto-optimal) solutions which can be examined and compared with respect to various achievements. The D M works with the system in an interactive way so that requirements as well some model parameters can be changed during the session. EDINET has been implemented as an independent module of the DINAS system to provide the D M with a friendly tool for problem modification and solution examination. In order to fulfill these different functions without destroying consistency of the model and solution, two independent modes of the editor work have been implemented: the edit and view mode. The edit mode allows the user to edit the network model. In the view mode the data and a solution to the problem is presented but the model can only be examined with no opportunity to change any piece of data. Our network editor is, in fact, an attributed graph editor. As we are interested in modeling mostly abstract networks we have decided not to use full graphic interface for network definition. Our editor uses windows extensively as interfaces to define attributes and relations for network objects. However, due to the relatively complex structure of the network model they are built into more complex structures of objects (than simply nodes and arcs). Moreover, EDINET provides a partial graphic (schematic) presentation of the network structure. Namely, for each node a schema of all incoming and outgoing arcs is provided and this graphic window can be dynamically moved across the network along the arcs. So, it can be considered as similar to so-called fish-eye view idea discussed by Furnas (1986). The paper is organized as follows. In the next section the class of network models processed by EDINET is described. Moreover, as the network editor is not only an interface but also a database, the data model used to store the network model is discussed in this section.

200 Subsequent two sections deal with the user interface functions of the editor. In the first one, LOCAL WINDOWS, we describe in detail the primary windows and the main working screen used to edit a network model. As the working screen covers only a fragment of the entire network, i.e., only a set of neighbors of one selected node, it may be regarded as a local mechanism. EDINET also provides global techniques to process the network model. These techniques are discussed in the section GLOBAL WINDOWS.

THE PROBLEM A N D DATA MODEL EDINET is designed to deal with multiple criteria transshipment problems with facility location. A network model of the transshipment problem with facility location consists of nodes connected by a set of direct flow arcs. The set of nodes is partitioned into two subsets: the set of fixed nodes and the set of potential nodes. The fixed nodes represent "fixed points" of the transportation network, i.e. points which cannot be changed, whereas the potential nodes are introduced to represent possible locations of new points in the network. Some subsets of the potential nodes may represent different versions of the same facility to be located (e.g., different sizes of warehouse etc.). For this reason, potential nodes are organized in so-called selections, i.e., sets of nodes with multiple choice requirements. Each selection is defined by a list of included potential nodes as well as by a lower and upper limit on the number of nodes which have to be selected (located). A homogeneous good (or service) is considered to be distributed among the nodes. Each fixed node is characterized by two quantities: supply and demand for the good, but for mathematical statement of the problem only the balance (supply-demand) is important. Each potential node is characterized by a capacity which bounds the maximal flow of the good through the node. The capacities are also given for all the arcs but not for the fixed nodes. A few linear objective functions are considered in the problem. The objective functions are introduced into the model by specifying coefficients associated with several arcs and potential nodes. They are called cost coefficients independently of their real character. The cost coefficients for potential nodes are, however, used differently than for arcs. The cost coefficient connected to an arc is treated as the unit cost of the flow along the arc whereas the cost coefficient connected to a potential node is considered as the fixed cost associated with the existence (location) of the node rather than as the unit cost. Summarizing, the following groups of input data define the transshipment model under consideration: objectives, fixed nodes with their balances, - potential nodes with their capacities and (fixed) cost coefficients, selections with their lower and upper limits on the number of active potential nodes, arcs with their capacities and cost coefficients. The problem is to determine the number and locations of potential nodes needed and to find the flows (along arcs) so as to satisfy the balance and capacity restrictions and, simultaneously, to optimize the given objective functions. A mathematical model of the problem is described in detail by Ogryczak et al. (1989). The DINAS interactive procedure works with a special binary file, in fact, a relational

201 database containing the entire information defining the problem. The editor creates and handles this database. A simple network model can be defined by creating a node set and then taking cross products to create an arc set. However, due to dealing with location decisions and multiple objective functions we have a more complex logical data model. The full entity-relationship diagram (e.g. Yourdon, 1989) of our database is presented in F i g . l .

I SELECTfoTTI

Fig. 1. Entity-relationship diagram. In the data model we distinguish four object types: NODE with two subtypes: FIXED and POTENTIAL ARC SELECTION OBJECTIVE The objects are characterized by the following attributes: NODE = ©NODE-NAME 4- { FIXED | POTENTIAL } FIXED = FNODE-BALANCE POTENTIAL = PNODE-CAPACITY + PNODE-FLOW + PNODE-EXISTENCE ARC = ©ARC-NAME + ARC-CAPACITY + ARC-FLOW SELECTION = ©SEL-NAME + SEL-LOWER-LIMIT + SEL-UPPER-LIMIT OBJECTIVE = ©OBJ-NAME + OBJ-TYPE + OBJ-ACTIVITY + OBJ-VALUE where © points out an object identifier. The relations between the entities can be characterized as follows: FROM = ©NODE-NAME + 0 { ©ARC-NAME } M TO = © N O D E - N A M E + 0 { ©ARC-NAME } M FCOST= 1 {©NODE-NAME} M + 1 {©OBJ-NAME + FCOST-COEFFICIENT} N UCOST = 1 {©ARC-NAME} M + 1 {©OBJ-NAME + UCOST-COEFFICIENT} N INCLUDES = 1 { ©NODE-NAME } M + 1 { ©SEL-NAME } N As one may observe, the attributes describe the network problem as well as its solution. For instance, OBJECTIVE is characterized by the data: OBJ-NAME (name of the objective), OBJ-TYPE (MIN or MAX) and OBJ-ACTIVITY (taken into account for optimization or not), and by the solution parameter OBJ-VALUE (value of the objective function). Similarly, PNODE-FLOW (flow through a potential node), PNODE-EXISTENCE (location decision expressed by existence or not of a potential node) and ARC-FLOW (flow along an arc)

202 represent elements of a solution. Simply, as mentioned, EDINET has been designed to support the D M as a primary user in problem modification as well as solution examination. Therefore, it saves space for one solution to allow the interactive optimization module to load a solution from a special solution base.

L O C A L WINDOWS The essential data of the network model can be divided into two groups: logical data defining the structure of the transportation network (e.g., nodes, arcs, selections); numerical data describing the nodes and arcs of the network (e.g., balances, capacities, and coefficients of the objective functions). A general concept of EDINET is that it edits numerical data while examining or defining the logical structure of the network. More precisely, the editor allows the user to move from a node to its neighbors along arcs, and the numerical data are edited with special windows while visiting several nodes and arcs. There are two primary windows supporting editing for the two most numerous objects in our data model: the N O D E window and the ARC window. In fact, there are two types of N O D E window depending on the subtype of the node (FIXED or POTENTIAL). These primary windows allow the user to edit most of the numerical data and some logical relations defining the network model as they support all the attributes of the corresponding entities and some relationships. Both type of N O D E windows (see Fig.2 and 3) include editing fields for name of the node (NODE-NAME) and type of the node (FIXED or POTENTIAL). The FIXED-NODE window also includes a field for the supply-demand balance (FNODE-BALANCE). The POTENTIAL-NODE window includes an editing field for capacity (PNODE-CAPACITY) as well as editing fields for names of selections (relationships INCLUDES) and fixed cost coefficients (FCOST-COEFFICIENT). Attributes PNODE-FLOWand PNODE-EXISTENCE represent a solution and are presented in a POTENTIAL-NODE window only during the view mode of the editor (solution examination). The ARC window includes editing fields for name of the arc (ARC-NAME), capacity (ARC-CAPACITY) and unit cost coefficients (UCOST-COEFFICIENT). Within the ARC window there are also presented nodes joined by the arc (relationships F R O M and TO) and, in case of the view mode, a flow along the arc (ARC-FLOW). At any time only one primary window is active and the corresponding attributes can be edited then. However, one can activate different windows while moving across the network. The nodes stay, obviously, in their relation with arcs and other nodes so we cannot narrow our field of vision to a single N O D E or ARC window. On the other hand, too wide a field of vision can be inconvenient for detailed work with editing data for a specific node. Therefore we have decided on having the main field of vision covering a selected node and all its (direct) neighbors. This concept of the field of vision has led us to the following design of the main working screen of the editor (see Fig.2). A selected node (hereafter referred to as the current node) is presented in the center of the screen, i.e., the N O D E window corresponding to the current node (CURRENT-NODE window) takes the center of the screen. Two groups of neighbor

203 nodes are distinguished: predecessors and successors of the current node, which are usually exclusive groups but in case of a two node loop a node can belong to both groups. Predecessors and successors of the current node are presented on the screen in the NODE-FROM and NODE-TO window, respectively.

-NODE FROMflpc Ngde^ Be-Br GB-Br Logan^* Lo-Br NewHaven NH-BrPt-Br Pitts Roanoke MainPas1

-MIM«I«MM-

Nane SESHJ Type potenti. Capacity [ Selections Objectives nane coefficient Dist 0 Proxiwit 0 Cost 550

Fig. 2. Main working screen.

-NODE FROHArc Node Be-Br BxTdge GB-Br Lo-Br Logan NewHaven NH-Br Pt-Br Pitts Roanoke NainPasl

Nane HWBHl Type potential Capacity UflM" Selections Objectives nane coefficient Dist 0 Proxinit 0 Cost 550

Fig. 3. Working screen with a neighbor node.

-NODE 10Node APC Pitts Br-Pt MainPas2

204 The NODE-FROM window (placed on the left hand side of the screen) contains names of all the nodes that directly precede the current node in the network structure, and names of the corresponding arcs (if they were named). Similarly, the NODE-TO window (placed on the right hand side of the screen) contains names of all the nodes and arcs which directly succeed the current node in the network structure. These windows will be empty if no predecessor or no successor to the current node have been defined. Names of nodes and corresponding arcs are presented inside the windows as vertical lists (scrolled if necessary). The NODE-FROM and NODE-TO windows provide the user with information about the network environment of the current node. However, their function is not limited to passive information. They can be activated like any other window and they then become important control tools by allowing the user to perform the following operations: open the NODE window to edit or examine attributes for a selected neighbor node; open the ARC window to edit or examine attributes for a selected neighbor arc; change the network structure by adding or deleting some nodes or arcs; move the field of vision by selecting a neighbor node as a new current node. In order to open a NODE or ARC window one needs to activate a proper NODE-FROM or NODE-TO window, indicate a node or arc and execute the appropriate key command. The NODE or ARC window then covers temporarily the corresponding NODE-FROM/TO window and one can examine or edit attributes. Thus there is no necessity to move the field of vision in order to edit any attribute of the neighbor nodes or arcs. For instance, having the working screen as in Fig.2 with indicated W-Salem node in active NODE-TO window one opens the NODE window for this node as in Fig.3 by pressing < ENTER > . Next, one can open the corresponding ARC window as in Fig.4.

-NODE FROMftpc Node Be-Bp G-Bridge GB-BP LO-BP Logan NewHaven NH-BP Pt-Bp Pitts Roanoke MainPas1

-MUSS

Nane iffiFEHjli Type potentii Capacity M Selections Objectives nane coefficient Dist 0 PpoxiHit 0 Cost 550

Fig. 4. Working screen with a neighbor arc.

—mm®Hane

ittflE3

ARC BpistoHl-SaleH NaHe

ir

CapacityU

Objectives name coefficient Dist 253 PpoxiHit 0.673 Cost 3581

205 If one wants to move the field of vision to a neighbor node one can indicate the node and execute the shortcut command forcing the node to become a new current one. The selected node is then automatically moved into the C U R R E N T - N O D E window and the N O D E - F R O M and N O D E - T O windows are advanced to its predecessors and successors, respectively. This mechanism allows the user to move the field of vision neighbor by neighbor staying with the same working screen. For instance, having the working screen as in Fig.2 with indicated W-Salem node in active N O D E - T O window one moves this node to the C U R R E N T - N O D E window as in Fig.5 by pressing < A L T / C > . For further jumps across the network one can use the global mechanisms described in the next section. The working screen also allows the user to change (or define) the structure of the network. It allows to delete an indicated arc or node (in this case all the connecting arcs are also deleted) as well as add other nodes and arcs. In order to define a new successor or predecessor to the current node one needs to open the proper side window (NODE-FROM/TO) and type the name of the new node. The N O D E window is then automatically opened to allow the user to complete the node definition and open the ARC window to define the corresponding arc. The same procedure can be followed to define a new arc to an existing node that is not connected with the current one. The editor simply compares each inserted node name versus the entire list of all the existing nodes, and in the case of a new node an empty N O D E window is opened. Otherwise the opened N O D E window is filled out with attributes of the existing node and the user can open the ARC window to define attributes of the corresponding arc.

Node

-NODE FROMflrc MainPas2 R-S

Nane HSIH pe fixed Ball:ance

Fig. 5. Working screen with node W-Salem.

GLOBAL WINDOWS In the previous section the local windows organized in the working screen have been discussed. The editor also needs global operations. There are two types of global operations: input/output operations (like L O A D , SAVE, PRINT etc.) and global network editing operations. In this section we concentrate on the latter. All the global operations are available in E D I N E T via a menu but the global network operations can also be executed directly from the working screen with shortcut keys (ALT commands). The EDIT-NETWORK branch of the editor menu provides four commands: OBJECTIVES, SELECTIONS, LIST-NODES, and NETWORK. The commands, if executed, activate the corresponding global windows. The first two windows, OBJECTIVES and SELECTIONS,

206 could be considered formally as local windows since they represent entities from our network data model (OBJECTIVE and SELECTION, respectively). There are, however, two reasons to regard them as global ones. First, they are connected with the rather global structure of the model as one instance of these entities remains in relation with many other instances of N O D E and/or ARC entities. Each objective function engages, via the relationship UCOST, all the arcs and, via the relationship FCOST, all the potential nodes. Similarly, a single selection covers with relationship INCLUDES many potential nodes. The second reason is connected with the window implementation. Namely, some attributes presented in these windows (like objective values and selected potential nodes) play a crucial role in the D M ' s perception and evaluation of the solution. Moreover, they are most informative to the D M in the case of simultaneous presentation of all the objective values and all the selected potential nodes within several selections, respectively. Therefore we have taken advantage of the remarkably low number of attributes describing entities OBJECTIVE and SELECTION as well as the usually low number of instances of these entities which allows us to present entire lists of all the instances with attributes instead of several instances in the corresponding windows. That is, the OBJECTIVES window provides one line per objective and includes the editing fields for the name of the objective (OBJ-NAME), for the type of optimization (OBJ-TYPE) and for the activity switch (OBJ-ACTIVITY) as well as space for the objective value (OBJ-VALUE) to be presented in the view mode. The SELECTIONS window presents for each selection its name (SEL-NAME) and the list of potential nodes included in the selection (relationship INCLUDES). Moreover, for the indicated selection one may activate an auxiliary window to edit the lower and/or upper limits for the selection (SEL-UPPER-LIMIT and SEL-LOWER-LIMIT).

en power-.ne Balti Excess Mason Pitts W-Salew

Beckley G-Bpidge NewHaven Raleigh Mash

Belle Hu«h Nitpo Richmond

Logan PMli Roanoke

Na*e

rmni

Type Potential Capacity IMfla Selections

SHIP

• •



Objectives naae coefficient Dist 0 Proxinit 0 Cost 550

U U 3 3 Move cursor

[£JE node nane OP s e l e c t node

\il\m

s e l e c t node

Fig. 6. LIST-NODES window. The LIST-NODES and NETWORK windows provide the D M with global overview of the network and can be used to control movement of the working screen across the network as well. They are designed to support the D M ' s most typical ways of solution examination and

207 problem modification. According to our experience, in analyzing a transshipment problem with facility location the D M is usually most interested in examination of situations in some crucial points of the network (usually nodes, sometimes arcs) and in analysis of the specific paths across the network. This observation has directed our approach to the global mechanism in E D I N E T . In the LIST-NODES window an alphabetic list of all the defined nodes is presented (see Fig.6). The types of the nodes (FIXED or POTENTIAL) are distinguished with different colors. At any time a certain node from the list is pointed out and one can move the pointer. In the upper right corner of the screen the N O D E window corresponding to the pointed out node is presented. So, one can examine attributes of all the nodes moving across the list. Moreover, in the latest version of EDINET (Ogryczak et al., 1991) we allow the user to activate this N O D E window which facilitates examination of the attributes and also their editing. There are two ways to leave the LIST-NODES window. One can simply close the window and return control to the editor menu or any other place from which the LIST-NODES command was executed (e.g. to the last working screen if the shortcut key was used to look into the LIST-NODES window). One can also move the working screen to a selected area of the network. Namely, there is opportunity to go from the LIST-NODES window to the working screen specifying the pointed out node as the current node. Assume for instance that the D M wants to examine node Bristol. Having indicated this node as in Fig.6 by pressing < ENTER > one moves to the working screen as in Fig.2. The node can be indicated on the list with pointer or its name can be typed. The latter allows one to activate the working screen for a new node.

rM lane

iffiHiHSa

Type fixed lalance -95

Bristol Roanoke

41-Salen-

Roanoke M-Sale*

4ash

UUB nove inside the net

MUM select node

Fig. 7. NETWORK window with node

Roanoke.

208 The NETWORK window presents a graphic scheme of the network (see Fig.7). The basic concept of the window function is somewhat similar to the LIST-NODES one. Here also an alphabetic list of all the nodes is available (scrolled vertically). However, the purpose of this list is to show the network structure. Therefore on each (central) node are hung its predecessors (on the left) and successors (on the right). So, for each node one can see the local structure of the network illustrated by incoming and outgoing arcs. Similarly as in the LIST-NODES window, in the upper right corner of the NETWORK window the N O D E window corresponding to the indicated (central) node is presented. When some predecessor or successor is indicated, the N O D E window is covered by the corresponding ARC window. Thus, one can examine attributes of all the nodes and arcs moving across the network. Moreover in the latest version of EDINET (Ogryczak et al. 1991) the user is allowed to y activate this N O D E or ARC window which enables one not only to examine the attributes but also to edit them. With this extension the NETWORK window can be used as an alternative working screen. One can examine the entire network, piece by piece, moving along the list or along the network structure (along the arcs). The latter is implemented as follows. By default, the pointer indicates the central node at the top of the window. However, one can move the pointer to one of its predecessors or successors (in fact an auxiliary pointer arrives then) and force, by pressing < ENTER > , this predecessor or successor to become the top central node. Assume for instance that the D M wants to examine data along the crucial path being considered to create by activation of potential node Bristol: from node Roanoke through arc MainPasl to node Bristol and further through arc MainPasl to node W-Salem.

r u c t u r e of t h e networ Beckley Belle G-Bridge Hugh Logan Hason NewHaven Nitro

-r J

1

eh power*.ne Pitts Phili Balti Wash Raleigh U-Salen RichMond jirami

Bristol - r Roanoke -

41-Salen-

Roanoke - r N-Salea J

-Uash

Belle Raleigh RichMond Hash Phili

Nawe y Type f i x e ARC Roanoke-Bristol NaMe SEfffliEHI Capaci t y l E H H H H I Objectives name coefficient Dist 138 ProxiHit 9.978 Cost 2328

E H B Move i n s i d e the net M U M s e l e c t node

Fig. 8. NETWORK window with arc

MainPasl.

209 t r u c t u r e of t h e networ

o r problem power.ne

Belle —r G-Bridge Logan NewHaven Pitts Roanoke -* Beckley Belle G-Bridge Hugh Logan Mason NewHaven Nitro

- r Pitts L M-Salen

Nane ijSS&l , Type potential Capacity [tfct] Selections ^MlflP H H H 1 Objectives name coefficient Dist 0 Proxinit 0 Cost 550

-Excess

UEB Move inside the net M U M select node

Fig. 9. NETWORK window with node Bristol.

t r u e r u r e o [sum

Belle G-Bridge Logan NewHaven Pitts Roanoke

-I

Beckley Belle G-Bridge Hugh Logan Nason NewHaven Nitro

-r J

l e h power.ne

-r- PittS L «HfflE3

-Excess

NaMe , Type potential ARC Bristol-W-SaleM Name _ Capacity Objectives name coefficient Dist 253 ProxiMit 0.673 Cost 3581

[US Move inside the net M U M select node

Fig. 10. NETWORK window with arc

MainPas2.

21 0

r u c t u r e o MID

1

Bristol - r Roanoke -

Roanoke - r W-SaleM - T

-aasa—

ew p o w e r . n e

Belle Raleigh RichMond Mash Phili

NaMe Type fixed Balance -46

-Wash

[ U S Move inside the net

MUM s e l e c t node

Fig. 11. NETWORK window with node

W-Salem.

In the NETWORK window as in Fig.7 the D M can examine and/or edit data for node Roanoke. Indicating node Bristol in the right stub retrieves the ARC window for arc MainPasl as in F i g . 8 . Next pressing < ENTER > the D M moves to node Bristol receiving the screen as in Fig.9. Continuing this process the D M can examine and/or edit data for arc MainPas2 (Fig. 10) and node W-Salem (Fig. 11). Similarly as in the LIST-NODES window there are two ways to leave the NETWORK window. One can simply close the window and return control to the editor menu or to any other place where the NETWORK command was executed (e.g. to the last working screen if the shortcut key was used to look into the NETWORK window). One can also move the working screen to a selected area of the network as it is possible to leave the NETWORK window to the working screen specifying the central node as the current node. For instance, having indicated the central node W-Salem as in Fig. 11, pressing < ENTER > activates the working screen as in F i g . 5 .

CONCLUDING REMARKS This paper has provided a description of an implemented network editor based on the attributed graphs concept. Our experience with the editor shows that despite using only limited graphic tools the editor has turned out to be very friendly and efficient. The main working screen, built as a dynamic system of primary windows for a node and all its neighbor nodes and arcs, is usually easily learned and accepted by users. The lack of a graphic map of the entire network is compensated by a dynamic presentation of local network schemes.

211 EDINET has been especially designed to support the D M during interactive sessions with the DINAS system. One can, certainly, use the described techniques to create a new model but for a large problem (like a real-life application discussed by Malczewski and Ogryczak, 1990) it could be very tedious and slow. However, the task of creation and initial testing of a model is usually performed by an analyst who prefers a more professional model generator based on some modeling language. In EDINET we decided to provide the user (in this case the analyst rather than the DM) with an import option allowing input of a data file containing lists of entities and their attributes. Moreover, EDINET can also generate the standard MPS-file for a current model to permit use of Mainframe L P software if the problem size exceeds capability of the D I N A S ' s built in optimizer. It would be better if the editor could also present a graph of the entire network on one screen (sheet). This function would be, however, of advantage only if the user would not be forced to prepare a detailed design of the network layout. That means, the editor should be able to create, automatically and algorithmically, a visual graphic network directly from the database of nodes and arcs. Much research has been directed to the problem of so-called graph beautifiers that try to determine positions at which to draw nodes for an appropriate representation (e.g. Messinger et al., 1989; Rowe et al, 1987; Tamassia et ah, 1988). Unfortunately the general rules not always can generate the most appropriate presentation of specific network OR models. Thus this is still an area of future research to improve dramatically the friendliness of user interfaces for network software.

REFERENCES Bisshop, J. and A. Meeraus (1982). On the Development of a General Algebraic Modeling System in a Strategic Planning Environment. Mathematical Programming Study, 20, 1-29. Brooke, A., D . Kendrick and A. Meeraus (1988). GAMS - A User's Guide. The Scientific Press, Redwood City. Crow, G.B. (1990). BriefCASE - The Collegiate Systems Development Tool. South-Western, Cincinnati. Fourer, R. (1983). Modeling Languages Versus Matrix Generators. ACM Transactions on Mathematical Software, 9, 143-183. Fourer, R., D . M . Gay and B.W. Kernighan (1987). AMPL: A Mathematical Programming Language. CS Tech. Report, 133. AT&T Bell Labs. Furnas, G.W. (1986). Generalized Fish Eye Views. In: Human Factors in Computing Systems (M. Mantei and P . Orbeton, eds.), ACM C H I ' 8 6 Conference Proceedings, pp. 16-23. A C M , New York. Grigoriadis, M . D . (1986). An Efficient Implementation of the Network Simplex Method. Mathematical Programming Study, 26, 83-111. IBM (1971). Mathematical Programming System Extended (MPSX): Matrix Generator and Report Writer (MGRW). IBM SH19-5014, New York. IBM (1971a). Mathematical Programming System Extended (MPSX): Program Description. IBM SH20-0968, New York. IBM (1978). Mathematical Programming System Extended/370 (MPSX/370): Introduction to Extended Control Language. IBM SH19-1147, New York. Jones, C.V.(1990). An Introduction to Graph-Based Modeling Systems. Part I: Overview. ORSA Journal on Computing, 2, 136-151.

212

Lewandowski, A. and A . P . Wierzbicki (eds.) (1989). Aspiration Based Decision Support Systems - Theory, Software and Applications. Lecture Notes in Economics and Mathematical Systems, 3 3 1 . Springer Verlag, New York. Malczewski, J. and W . Ogryczak (1990). An Interactive Approach to the Central Facility Location Problem: Locating Pediatric Hospitals in Warsaw. Geographical Analysis, 22, 244-258. Maximal Software (1990). MPL Modelling System. Reykjavik. McBride, R . D . (1988). NETSYS - A Generalized Network Modeling System. Tech. Report. Univ. of Southern California, Los Angeles. Messinger, E . B . , L . A . R o w e a n d R . H . Henry (1989). A Divide-and-Conquer Algorithm for the Automatic Layout of Large Directed Graphs. IBM Research Report 6709. IBM Research, Yorktown Heights. Murphy, F . H . and E.A. Stohr (1986). An Intelligent System for Formulating Linear Programs. Decision Support Systems, 2, 39-47. Ogryczak, W . , K. Studziiiski and K. Zorychta (1988). Dynamic Interactive Network Analysis System DINAS Version 2.1 (1988): User's Manual. WP-88-114. IIASA, Laxenburg. Ogryczak, W . , K. Studziiiski and K. Zorychta (1989). A Solver For the Multiobjective Transshipment Problem with Facility Location. European Journal of Operational Research, 43, 53-64. Ogryczak, W . , K. Studziiiski, K. Zorychta (1991). DINAS Dynamic Interactive Network Analysis System, v . 3 . 0 . Collaborative Paper. CP-91-012. IIASA, Laxenburg. Ogryczak, W . , K. Studziiiski, K. Zorychta (1992). DINAS - A Computer Assisted System for Multiobjective Transshipment Problems with Facility Location. Computers & Operations Research. In print. Rowe, L . A . , M . Davis, E. Messinger, C. Meyer, C. Spirakis and A. Tuan (1987). A Browser for Directed Graphs. Software - Practice and Experience, 17, 61-71. Sharda, R. and C. Somarajan (1986). Comparative Performance of Advanced L P Systems. Computers & Operations Research, 13, 131-141. Steiger, D . M . , R. Sharda and C. Nanga (1990). Functional Description of a Graph-based Interface for Network Modeling (GIN). Oklahoma State University, College of Business Administration, Technical Report, 90-22. Tamassia, R., G. Di Battista and C. Batini (1988). Automatic Graph Drawing and Readability of Diagrams. IEEE Trans, on Systems, Man and Cybernetics, 18, 61-79. Yourdon, E. (1989). Modern Structured Analysis. Yourdon Press, Englewood Cliffs.

F U N C T I O N A L DESCRIPTION O F A GRAPH-BASED INTERFACE FOR NETWORK MODELING (GIN) DAVID STEIGER and RAMESH SHARDA College of Business Administration, Oklahoma State University Stillwater, OK 74078 and BRIAN LeCLAIRE School of Business Administration, University of Wisconsin-Milwaukee P . O . Box 742, Milwaukee, WI 53201

ABSTRACT In recent years the MS/OR profession has made important advances in the solution techniques of network optimization models. However, significantly less progress has been made in 1) the interfaces between these models and the model builders and users, and 2) the documentation and validation of the model logic. Furthermore, while text-based model development systems such as GAMS do help in reducing the drudgery associated with model development and documentation, recent advances in microcomputer graphics offer an even more versatile tool for this process. The purpose of this paper is to describe a partially implemented graph-based interface for network modeling, GIN, which is designed for formulating, solving and analyzing minimum cost flow network models using the pictorial representations of N E T F O R M S . This system is being implemented in an interactive, graphics-based, microcomputer environment using object-oriented programming tools.

KEYWORDS Networks, graphical interfaces, decision support systems, modeling, software.

INTRODUCTION In recent years the MS/OR profession has made important advances in the solution techniques of mathematical programming models. However, significantly less progress had been made in the interfaces between these models and the model users. For many OR projects, "the user interface has been primarily an afterthought, a necessary evil, but not given a great deal of attention" (Jones, 1988). This lack of progress in user interfaces is even more important if, as Geoffrion suggests, the principal benefit of an OR project is "insight, not numbers" (Geoffrion, 1976). Further, it has been argued that insight is 213

214 actually a product of a right half of the brain, i.e., the pictorial side (Ornstein, 1973). Thus insightful analysis of mathematically modeled problems is enhanced via the use of a pictorial representation of those problems. One such pictorial representation is NETFORMS (or NETwork FORMulationS), a modeling technique which presents mathematical programming models in the form of symbolic, pictorial networks and augmented network structures (Glover et al., 1978; Glover et al., 1977). The purpose of this paper is to provide a functional description of a partially implemented graph-based interface for network modeling, GIN, which is designed for formulating, solving and analyzing minimum cost flow network models using the pictorial representations of N E T F O R M S . This system is being implemented in an interactive, graphics-based, micro-computer environment using object-oriented programming tools. An objective of presenting a partially implemented system is to stimulate discussion on design goals of such systems and to encourage parallel developments of similar systems elsewhere. The general problem domain for which GIN is targeted is one in which the model user is a manager/decision maker in the corporate business world who knows only a little about MS/OR techniques. His primary use of models is to generate insights into the relative importance of key business factors, given various scenarios under specific sets of conditions. This problem domain is also applicable to the academic community; i.e., in introducing business majors and MS/OR students to effective network modeling concepts.

CURRENT MODELING ENVIRONMENTS

Modeling Languages Historically, mathematical modeling has required multiple transformations of model forms to generate an optimal solution and communicate that solution to the decision maker. That is, the decision maker and modeler often create a pictorial or graphical form of the model (the modeler's form), convert this graphical form into a set of mathematical statements (the algebraic form), convert the algebraic form and the associated model data into a form suitable for an optimization algorithm (the algorithm's form), invoke the appropriate solver, convert the solver output into human readable numerical listings and/or tables, and preferably convert the solution into an annotated, graphical (modeler's) form (Zenios, 1988). In the last few years, modeling languages have been developed to simplify this complex and involved model transformation task (Fourer, 1983). From the early implementations of matrix generators and report writers, such as MAGEN (Haverly Systems, 1977) and U I M P (Ellison, 1982), the trend has been to integrate and automate model execution outward toward the modeler, transferring more of the burden of problem transformation from the modeler to the computer (Zenios, 1988). For example, LINGO/PC (Paul, 1989) and GAMS (Bisschop and Meeraus, 1982) are both commercially-available, text-based modeling languages which require that the model be manually transformed from the modeler's form to an algebraic form. These languages then

215 transform the algebraic form into the optimizer form and call the solver. Both languages provide output in the form of tabular and summary reports. Other languages, currently in various phases of research and development, continue this trend of integration by using interactive interfaces, interactive graphics and/or expert systems to aid the user in specifying the model and interpreting the results. Such languages include LOGS (Brown et al., 1986), NETSYS (McBride, 1988), L P F O R M / L P S P E C (Ma et al., 1989; Murphy and Stohr, 1986; Murphy, 1988; Stohr, 1988) and GBMS (Jones, 1990, 1991). GIN is distinguished from these current modeling languages in several aspects. First, GIN introduces and accepts, as input, the pictorial (NETFORM) equivalent of the canonical form of a network model. This capability explicitly addresses the modeling difficulties introduced by dimensionality both in model specification and case instantiation. Second, GIN provides independence of structure and data during model building as recommended by Geoffrion (Geoffrion, 1987), as do many modeling languages, but explicitly provides pictorial integration of model structure, instantiating data and optimal flows during validation and analysis. Third, GIN provides simultaneous display of model overview and network detail, with highlighting of the appropriate overview icons used to depict the correspondence between the two. Finally, GIN is being developed using object oriented programming to handle interactive graphics and to provide easy program modifications. It is being developed on an IBM-PC compatible platform, which should enable a large majority of industry and academic users to employ this tool.

Visual Interactive Modeling GIN is not only a modeling language as defined by Fourer (1983) it is also a visual interactive model (VIM) as defined by Bell; specifically, it is an iconic graphic VIM (Bell et al., 1984; Bell and Parker, 1985; Bell, 1985; Hurrion, 1986). This class of models "attempt to represent the problem or process being modeled as a graphic display." Examples of optimization-based (as opposed to simulation-based) V I M ' s that have been implemented include the traveling salesman problem (Hurrion, 1980), facilities location problem (Bhatnager, 1983; Patel, 1979), a resource allocation problem (Lembersky and Chi, 1984), and GBMS (Jones, 1988). Each of these applications utilize interactive iconic displays as an integral part of the problem solving model. GIN is distinguished from these applications of VIM in that it uses minimum cost flow networks to model and display the problem. In addition, GIN is unique in its use of the N E T F O R M notation and in its implicit and explicit application of model building rules (for construct validation). Since GIN is an iconic graphic VIM and, when fully implemented, will separate the design of the visual model from the development of the underlying mathematical model, it shares the advantages which VIM methodology offers to the manager. Specifically, the manager/decision maker can directly apply the latest MS/OR optimization technologies to solve his complicated business problems even though he is not a specialist in these technologies. In addition, the manager can understand the visual model more easily than the mathematical model and can thus more effectively contribute to its development and to

216 monitor its validity. This approach also lets the manager assess the value of the model very early in its development (Bell and Parker, 1985).

OVERALL SYSTEM DESIGN A N D ARCHITECTURE

Design Goals and Requirements There are three primary design goals which form the development philosophy behind GIN. First, as stated earlier, we want to provide the manager/decision maker, a nonspecialist in MS/OR, with a visual, interactive, easy-to-use model which captures his pictorial representation of a given business situation. Second, we want to provide the model builder, a knowledgeable practitioner of MS/OR, with an environment in which he can learn both effective modeling techniques and potential applications of network models. And finally, we want to provide the student, a novice experimenter in MS/OR, with an environment in which he can learn both effective modeling techniques and potential applications of network models. These design goals translate into several requirements for GIN. interface must provide and support the following characteristics:

Specifically, the GIN

— an enhanced pictorial equivalent of the canonical or algebraic model form. — visual integration of model structure, instantiating data and optimal flows during model validation, solution and analysis. — linkages between model parameters and values stored in external databases (Geoffrion, 1987; Ma et al., 1989). — a simplified, pictorial equivalent of a graph grammar (Jones, 1990, 1991) to maintain structural validity during model building and modification. — push-button initiation of model optimization, displaying the results of multiple cases in the original pictorial form to enhance the development of insights from case comparisons. — a flexible, visual, interactive model which closely approximates the decision maker's personalized pictorial representation of the real world problem domain. As for this last characteristic, the personalization of model representations is provided in GIN by using interactive microcomputer graphics to create, on the CRT, a visual NETFORM model representation which is as close as possible to the decision maker's pictorial model. This NETFORM representation is automatically translated, via GIN software routines, into a computer readable form appropriate for solution with a network optimization package. In addition, Cartesian coordinates are saved for each N E T F O R M component displayed on the screen. After invoking the solver, the optimal model flows are numerically displayed directly on the NETFORM model. GIN is an executable modeling language which can, for specific classes of network models, recreate the exact pictorial N E T F O R M model (complete with instantiating data and optimal flows) at any time.

217 System Architecture The GIN environment is divided into three stages following Jones (Jones, 1990). The Schema Editor, stage one, allows the model builder, an MS/OR practitioner, to specify the type and characteristics of the network model (i.e., assignment, transportation, transshipment, or generalized network), as well as name the model instance. By specifying the model type, the model builder provides: 1) a specific set of allowable building block icons (arc,node and attribute combinations) from which the modeler can build a specific model and 2) a set of rules for using the building block icons (for example, see Fig. 1).

Building

^

i£r(77

Block Set Rules: 1 . Only supply nodes, S, can overlay other supply nodes. 2. Only demand nodes, D, can overlay other demand nodes. 3. At least one entering and one exiting arc must be specified for each node: i.e. isolated nodes are not allowed. Fig. 1 . Instance editor: building blocks and rules for a transportation problem (bipartite and capacitated) The Instance Editor, stage two, allows the modeler and/or decision maker to: 1) build and display graphically an overview of the model structure, and 2) build and display graphically a detailed N E T F O R M model representation using the appropriate building block icons and the associated rule set. The Case Editor, stage three, provides the decision maker with the capability to specify and run various what-if cases by allowing him or her to: 1) specify model parameters (e.g., costs, bounds, etc.) using either interactive graphics to change individual parameters or external databases to change groups of parameters, 2) add or delete arcs and/or nodes, 3) invoke the model solver, 4) display, on the CRT, the fully annotated network structure, including all parameter values and optimal flows, as specified via the Instance Editor and modified via the Case Editor, 5) review the model via zoom and scroll capabilities, and 6) name and save a specific case.

Model Development Flexibility Models can be developed with GIN in either a top-down or bottom-up mode, depending on the desires of the model builder. That is, a model can be specified in an overview block format, with each block being subsequently specified in increasing detail; this is the graphical equivalent of hierarchical decomposition. Alternatively, a model can be built at

218

the basic arc-node level of detail, and separated and stored in overview blocks as specified by the user afterward. In addition, the model builder has the option of importing (via an option in Instance Editor command) a previously specified NETFORM model from an external database of GIN models and incorporating it into the current model.

M O D E L BUILDING WITH GIN To describe the GIN model building environment, we first describe the screen layout and the associated man-machine interactions. Then, in Section 4, we will illustrate the use of this environment in building a simple transportation model. The GIN screen layout (Fig. 2) can be functionally divided into six sections: 1) pull-down menus (PDMs) for user command control, 2) the Model Overview Section, 3) the Toolkit containing the building block icons, 4) the WINDOWS icons, 5) the Message Area, and 6) the Working Area. Each of these sections is discussed briefly below.

Pull Down Menus There are four pull-down menus (PDMs), one for each of the four steps of model building (Fig. 2). That is, the Schema Editor PDM contains command options for specifying the model ID and mode class (assignment, transportation, transshipment, generalized network, etc.). The Overview Editor P D M contains the control commands for building and storing the overview model representation. The Instance Editor P D M contains the commands for building, storing and editing the detailed model. Finally, the Case Editor P D M contains model instantiation commands, case management commands, and the solve and display commands.

Model Overview The Overview Section contains a pictorial overview of the model. This overview is a macro view of the model in iconic representation with arc interconnections as appropriate. For example, a multiple period transshipment model might be represented in the Overview Section by a manufacturing icon and a warehouse icon connected horizontally by arcs and replicated vertically for each time period, with vertical inventory arcs connecting the periods (Fig. 2). The Overview Section provides, via highlighting of the appropriate overview icon(s), a continuous reference to that part of the detailed network model which is currently displayed in the Working Area.

Toolkit The Toolkit provides the Basic Building Blocks (BBB) icons, each of which is a connected and indivisible combination of arcs, nodes and attributes used by the modeler to build the network at the arc-node level of detail. In general, the model is built under mouse

21 9

Fig. 2 . Pictorial Equivalent of the Canonical Form for a Multiple Period

220 interaction by clicking on the appropriate BBB icon in the Toolkit and dragging the icon onto the Working Area, overlaying nodes as appropriate. Arcs in the BBB icons can be "stretched and bent" by clicking and dragging either end of the arc (the point to the left remains anchored). In addition, the arc attributes (cost box, optimal flow semicircle, multiplier triangle and parenthesized bounds) can be repositioned anywhere on the arc, either individually (by clicking and dragging the top right corner of the individual attribute) or as a group (by clicking and dragging the dot between the cost box and optimal flow semicircle). Each class of network problem (e.g., assignment, transportation, etc.) has a different set of building block icons. These class-specific icons, together with the corresponding, hardcoded, rule set for their interconnections, provide a simplified, pictorial graph grammar (Jones, 1991) for the specification of valid NETFORM structures. For example, the transportation icons plus the rules that 1) tail nodes of additional icons must be overlaid on the head nodes of existing icons and 2) all nodes must be connected, directly or indirectly, to at least one source and one sink, enforce the bipartite structure and connectedness requirements of transportation networks. The repeat icon, denoted by a vertical ellipsis, provides the modeler with a concise way to specify repeated patterns that appear in the network model; e.g., multiple period models which have the same structure repeated over time. Execution of the repeat construct results in the display of a dialogue box prompting the modeler for the number of repetitions or the data set name over whose components the model is to be repeated. It also prompts the modeler for the node naming convention for each repetition (e.g., T(n)* indicates concatenation of the time period indicator at the front end of each repeated node name). Network parameter values (i.e., arc costs and flow bounds) can be specified in one of three methods. First, each parameter can be specified individually by clicking the mouse cursor within the appropriate parameter icon (e.g., in the cost box) and keying in the desired value. Alternatively, arc costs can be inflated or deflated over multiple time periods by selecting a specific arc cost and executing a COST MOD command in the Case Editor P D M . T h i s . will prompt the decision maker for the appropriate time periods inflation/deflation factor and starting cost. And finally, data can be specified from external relational databases via the DATABASE LINK command in the Instance Editor P D M .

WINDOWS Icons This area of the screen contains a useful subset of WINDOWS standard tools, including two selection tools (the lasso and selection box), the pencil and eraser tools for drawing overview icons and the scroll and zoom tools for viewing different sections of large models. It also includes the Delete Arc and U N D O icons.

Message Area The Message Area provides a space to display summary information concerning optimal model solutions; e.g., the total cost information for an optimal model run. It also provides a space for displaying amplifying information concerning prompts, HELP output, etc.

221

Working Area The Working Area provides the space for building, viewing, and modifying the network model. Zoom and scroll capabilities are provided for working with large scale models.

E X A M P L E PROBLEM In this section we illustrate the use of the concept and tools described above to define a classical multiple period transshipment problem.

Model Schema The first step in building the model is to use the Schema Editor P D M to specify the general network type (Transshipment) and specify a model ID (TRANSSHIP).

Model Overview The next step, assuming we want to develop the model in a top-down mode, is to build the overview model. W e start this process by pulling down the Overview Editor P D M and selecting the EDIT command; this displays the existing overview model in the Working Area or, in the case of a new model, clears the Working Area. EDIT also displays a set of overview icons in the Toolkit Section of the screen. If the Toolkit does not include the appropriate overview icon(s), we can use the WINDOWS Pencil and Eraser to create different icons and store them in the Overview Toolkit via the STORE ICON command. Once the appropriate icons are in the Toolkit, we build the kernel of the overview model with the icons, clicking and dragging them from the Toolkit to the Working Area. W e then replicate the kernel for three time periods. The results are stored in the Overview Section via the STORE command.

Model Specification The next step involves building the kernel of the model at the detailed (arc and node) level. To do this, we first select the EDIT command in the Instance Editor P D M ; this results in the display of the transshipment Basic Building Block (BBB) icons in the Toolkit. W e then click and drag the appropriate icons from the Toolkit to the Working Area, overlaying nodes as appropriate and according to the node overlay rules. Attempts to violate these rules result in the display of a pop-up screen with a short but helpful error message. After each icon or group of icons is connected to the existing network, the node(s) are identified by positioning the cursor inside the node ID box, clicking it and keying in the appropriate node name. If external databases are to be used to provide arc costs and bounds, the node name should correspond to the name in the associated relational database.

222 Interconnections of supply and demand nodes (i.e., Transshipment arcs) can be specified in either of two ways. First, we can click and drag the transshipment icon from the Toolkit to the Working Area for each connection. This is the most straightforward, but tedious method. Second, we can use the Connect All icon, Qfl* , which allows us to fix one end of the transshipment icon on the appropriate supply or demand node and then click and drag the other end to all nodes connected to the fixed node. Once the kernel of the model is completed, the Repeat icon, , is used to replicate the model over the three time periods. This icon is employed in either of two methods. First, the user can click-and-drag it onto the working area, positioning it where the first copy of the kernel should begin; i.e., the positioning of the Repeat icon dictates the visual spacing between copies on the screen. After positioning the icon and releasing the mouse button, a series of dialogue boxes appear, prompting the modeler to select the appropriate set of nodes to repeat and then for either the external data set name containing the set of parameters for which copies are to be made, or the explicit number of copies to be made and an appropriate node naming convention. Alternatively, the modeler can click-and-drag the Repeat icon between two nodes or groups of nodes in the kernel and select (again via the WINDOWS select box) both nodes or sets of nodes, with the Repeat icon between the two select boxes; in this case, the visual spacing between copies of the nodes will be dictated by the relative spacing between corresponding nodes in the two select boxes. Of course, the two sets of selected nodes must be "compatible" in a replication sense; i.e., both sets must contain the same number of each type of node with relative spacings approximately the same. The prompts will be similar. The resulting displays, before and after executing the Repeat construct, are shown in Fig. 2 and 3 , respectively. Note that Fig. 2 is, in effect, the graphical equivalent of the canonical or algebraic form of the model. The final step in model building is storing it in the Overview Section. This step not only saves the model for future editing and running, but determines and save the correspondence between model nodes and overview icons. To accomplish this we first use the Zoom icon to display at least that section of the detailed network model which corresponds to a specific overview icon, then select (via the Select Box or Lasso) the appropriate set of nodes to be stored and finally, specify the STORE IN OVERVIEW command in the Instance Editor P D M ; this collapses the selected nodes into a small box in the Working Area of the screen. W e then click and drag this box onto the appropriate icon in the Overview Section.

Case Editing The final step of the modeling process involves model instantiation, validation, case modification and solution analysis. Model instantiation involves specifying values for all arc costs, flow bounds and, in the case of generalized networks, arc multipliers. Any or all data can be specified via keyboard entry by positioning the mouse arrow within the appropriate attribute icon, clicking, and typing in the required numerical data. Alternatively, if the node names have been specified to correspond to actual variable names in relational databases, the Case Editor's INSTANTIATION command can be used. Execution of this command results in the display of a dialogue box prompting the decision maker for the external data set names of the

223

Fig. 3 . Instantiated, Optimized NETFORM Output

224

appropriate relational databases. Once these are specified, the GIN software searches the databases for the appropriate combinations of "from nodes" and "to nodes" in the relational databases, loads the corresponding value(s), stores these values(s) in the model database and, upon completion, displays the model with instantiating data on the CRT. A list of model parameters not found in the database(s) is made available for review; their model values are set to the default values of cost = infinity, lower bound = upper bound = 0 and the arcs suppressed. Model structure and data validation benefit greatly from the general pictorial nature of networks; i.e., the decision maker can view the Overview to ensure that the model conforms to his/her overall expectations, then display any part of the detailed model by selecting the appropriate Overview icon and executing the EDIT F R O M OVERVIEW command in the Case Editor P D M . This displays the detailed NETFORM corresponding to the selected Overview icon. He/she can then use the zoom and scroll capabilities to focus on any given set of arcs and nodes; specifically, he/she can verify the conservation of flows around any node, check the bounds (e.g., pipeline capacity) on any arc, validate the interconnecting links between two sets of nodes (e.g., customers serviced from specific warehouses), etc. Further he/she can use the SOURCE command to display any selected parameter value within the context of the external relational database from which it was loaded. Case specification and modification involves changing parameters (costs, bounds, and possibly multipliers) as well as model structure (e.g., eliminating or adding arcs) to specify "what-if" cases. The decision maker can change parameters manually by clicking on a parameter icon and typing in the desired value. Alternatively, he/she can select a specific arc cost and use the COST M O D command in the Case editor P D M to specify inflated or deflated costs. Executing this command will cause the display of a dialogue box prompting him for the time periods, the inflation/deflation multiplication factor to be applied to each successive time period and the starting cost. Finally, solution analysis involves executing one of the SOLVE commands in the Case Editor P D M . The resulting optimal flows are displayed as part of the model structure and instantiating data. Two cases can be displayed simultaneously in an over-under arrangement for comparison by executing the SOLVE & COMPARE command (Fig. 3). Further, optimal costs and flows are summed by Overview icon and displayed in the Overview Section. This helps the decision maker determine where the largest changes in cost and flows have occurred in comparing one cost to another. In addition, the decision maker can save a case via the SAVE command, and can end the session with the END SESSION command.

IMPLEMENTATION AND CURRENT STATUS This project is a joint effort between industry (Occidental Petroleum Corporation) and academia (Oklahoma State University and the University of Wisconsin at Milwaukee) aimed at improving the interfaces of mathematical programming models. This paper contains the current state of the design phase of the project. In general, the selection of implementation software and data models was dictated by three underlying requirements of GIN: 1) the menu driven command structure, 2) the mouse-

225

controlled, event-driven interactive graphics, and 3) the flexibility of an iterative development environment. These underlying requirements favored an object-oriented data model since such models possess the following characteristics (Elmasri and Navathe, 1989): 1) Uniform interface - all objects are treated uniformly by accessing them through methods (i.e., user-defined operations). Thus, icons for assignment models and those for generalized transshipment models, while quite different in number of arcs, nodes and instantiating data, are accessed through the same methods. This is referred to as polymorphism in object oriented models. 2) Support of complex objects-object oriented models allow creation of arbitrarily complex objects involving hierarchies and lattices. 3) Information hiding and support of abstractions-trie abstraction mechanism can provide the essential external features of objects while hiding the internal representation or implementation details. Generalization and aggregation are easily supported. 4) Modularity, flexibility, extensibility and tailorability-object oriented models support schema evolution more easily than conventional models; new objects and/or new operations can be easily added and old ones modified or deleted. GIN is being implemented using several microcomputer technologies. Specifically, it is running as an application under Microsoft WINDOWS (Microsoft, 1989) on an IBM P C / A T with an EGA color monitor. The first version of GIN was implemented in a pure object oriented programming language, ACTOR (The Whitewater Group, 1989), which controls the mouse-driven graphics, the various windows and the pull-down menus. The model is solved with a network optimization package, NETFLO (Kennington and Helgason, 1980). (Eventually, this solution package will be augmented with packages capable of handling generalized networks and other network problem types.) We are also rewriting the system now in C + + . Object-oriented programming deals primarily with classes of objects which are arranged hierarchically to provide parent-child inheritance of class specific methods. These methods are invoked by issuing messages (e.g., plot, print, etc.) to the specific objects (e.g., icons, real numbers, n-tuples, etc.). In our implementation of GIN, we have eight classes of objects. These classes, along with several of their methods, include the following: 1) NetWindow class—The NetWindow class maintains a picture of a network in a window. The NetWindow is responsible for the drawing of the network. NetWindow descends from the class Window and inherits all of its instance variable and methods. In addition, more instance variables and methods are added to this class to suit network creation and manipulation. This class contains the main command method which matches the command messages sent by Actor in response to menu events and edit control changes. This method in turn sends additional messages to other objects to get things done. For example, this method sends an updateName message to a SupplyNode object whenever the name of a supplynode changes. Example message protocols include: init, setMenus, drag, updateArcs, initNodeLists, findNodeType, dist, fileSave, savePicFile, filOpen, checkError, readResults, arcColor, etc.

226 2) TboxWindow class-TboxWindow class is the class associated with the "TOOLBOX" window. When the application starts up, a toolbox window is created as a child window of the main NetWindow. Also, this window contains three child windows—one each, for the supply, demand, and arc tools. Example message protocols include: init, createChildren, paint, etc. 3) Arc, Supply and DemandTool classes: responsible for keeping track of which one of them have been currently chosen by the user. For example, if the user presses the mouse button in the supplyTool window, a message is sent to this object indicating that a mouse click event has occurred. Now, this object sends messages to itself to invert its client rectangle, indicating that it has been chosen. Similarly, the other two objects are responsible for keeping track of whether they have been chosen or not. These two objects also invert their client areas in response to messages they receive if any mouse activity takes place in them. Example message protocols include: create, endDrag, paint and in vert Arc. 4) SupplyNode and DemandNode classes: descend directly from the NetNode class. The SupplyNode and DemandNode objects keep track of their name, type and the relative display position on the screen. The relative position of these nodes have to be saved to avoid recomputing these coordinate values each time the NetWindow has to redraw itself. In fact, the SupplyNodeList and DemandNodeList maintained by the NetNode class are lists of instances of the objects of the respective SupplyNode or DemandNode Classes. Example message protocols include setUpSup(Dem)List, addToSup(Dem)List, paintSup(Dem)Nodes, etc. 5) SupEditNodeWin and DemEditNodeWin—Whenever a supply or a demand node is created a name must be provided. Appropriate editing facilities are provided for this. The classes SupEditNodeWin and DemEditNodeWin, which descend from the Edit class, serve this purpose. Whenever the name of a supply or demand node is changed, messages are sent by these controls to appropriate objects indicating the change. Instance message protocols include initDialog and command. 6) ArcDialog-The ArcDialog class descends from the formal Dialogue class. Instances of this class are used to initialize and edit the costs associated with each existing arc. Instance message protocols include initDialog and command. 7) The NetDataBase class-descends directly from the Object class. This database object is responsible for keeping track of node interconnections and arc attributes. Messages are sent to this object by supply and demand nodes if a name changes, and/or by an arc if its attribute(s) change. The Database object sends messages to itself to update any changes that have occurred. Instance message protocols include init, updateDbNmFlds, printDbase. 8) CostFlowDialog Class-This class descends from the Dialogue class. Instances of this class are used to read and alter the color settings of cost and flow when they are displayed. The default color for the display of costs and flows is red and green respectively. Instance message protocols include initDialog, command, getColor and flipformat.

227

With these classes and methods, the following capabilities are currently implemented in GIN: 1) the click-and-drag capability to position the arc/node/parameter icons anywhere on the screen and specify parameter values via clicking on an icon and keying the associated values into the resulting pop-up menus. 2) color coding of arcs and parameters. 3) automatic creation and continual update of the n-tuples in the network database, based on the network displayed and manipulated on the screen. This set of n-tuples forms the input for the solution algorithm. 4) invoking of the solver via a command in a pull-down menu. This passes the n-tuple database to the N E T F L O solver (Kennington and Helgason, 1980) and, after solving, displays the resulting optimal flows on the graphical network display. The next phase of GIN software development will include 1) optionally, creating a network model via the Repeat icon, with the display of the resulting model in either canonical or expanded form, 2) the coding of the rule set for connecting new icons to an existing network, thus enforcing and ensuring a viable network structure for the model type being developed, 3) the implementation of the overview section, 4) the expansion of model types supported from the current state (transshipment networks only) to include assignment networks, transportation networks and generalized transshipment models, along with the associated coding of the schema editor, and 5) the coding of instantiation routines which link GIN models to external databases.

CONCLUSIONS This paper has provided a functional description of a graph-based interface for network modeling, GIN, and its implementation via an object-oriented programming system. However, much work and investigation remains to be done. First, we need to extend GIN via both interface and algorithms to embrace additional network models, especially generalized networks, networks with single side constraints and LPs with embedded networks. This should include the expansion of the iconic and rule based graph grammar (currently hard coded) and/or the inclusion of a more flexible, user defined graph production capability for iconic interconnection during model development (Jones, 1991). Second, we need to investigate hierarchical decomposition and its effect on the comprehension of the model by the decision maker; i.e., when and how to aggregate groups of arcs and nodes into a representation with reduced detail and yet retain as much comprehension as possible. Third, we need to investigate ways to incorporate sensitivity analysis factors visually into the N E T F O R M and Overview models. Such factors could help the decision maker determine which what-if case should be specified next and which model parameter should be investigated further.

228 Fourth, we need to investigate ways, (perhaps using AI/ES) to create, automatically and algorithmically, a visual graphic network directly from the n-tuple representation accepted by most network solution packages currently available. This opens the Pandora's Box associated with automatic layout of general graphical structures, but would significantly increase G I N ' s audience. Fifth, we need to investigate alternative ways of displaying the network and optimized results non-graphically; e.g., as an adjacency matrix, vertex edge incidence matrix, etc. Finally, we need to investigate the application and incorporation of A . I . / E . S . technology to the analysis and case specifications of visual, interactive network models, especially large scale models.

ACKNOWLEDGEMENTS The authors would like to extend special thanks to Christopher V. Jones for his exceptionally insightful, helpful and detailed comments on earlier drafts of this paper, to Chakradhar Nanga for his initial programming support, to Harvey Greenberg and Fred Murphy for their encouragement and support, and to OXY USA, Inc. for initial financial support.

BIBLIOGRAPHY Bell, P . C , D . C. Parker and P. Kirkpatrick (1984). Visual interactive problem solving A new look at management problems. Business Quarterly, Spring 1984, 24-18. Bell, P . C. and D. C. Parker (1985). Developing a visual interactive model for corporate cash management. Journal of the Operational Research Society, 36:9, 779-786. Bell, P. C. (1985). Visual interactive modeling in operational research: Successes and opportunities. Journal of the Operational Research Society, 36:11, 975-982. Bhatnager, S. C. (1983). Locating social service centers using interactive graphics. OMEGA, 11:2, 201-205. Bisschop, J. and A. Meeraus (1982). On the development of a general algebraic modeling system in a strategic planning environment. Mathematical Programming Study, 20, North-Holland Publishing Company, New York, NY, 1-29. Brown, R. W . , W. D . Northup and J. F . Shapiro (1986). LOGS: A modeling and optimization system for business planning. In: Computer Assisted Decision Making, (G. Mitra, Ed.), North-Holland Publishing Company, New York, NY, 227-258. Ellison, E. (1982). UIMP: User interface for mathematical programming. ACM Transactions on Mathematical Software, 8:3, 229-255. Elmasari, R. and S. B. Navathe (1989). Fundamentals of database systems. Benjamin/Cummings Publishing Company, New York, NY. Fourer, R. (1983). Modeling languages versus matrix generators. ACM Transactions on Mathematical Software, 9:2, 143-183. Geoffrion, A. M . (1976). The purpose of mathematical programming is insight, not numbers. Interfaces, 7 : 1 , 81-92. Geoffrion, A. M. (1987). An introduction to structured modeling. Management Science, 33:5, 547-588.

229 Glover, F . , J. Hultz and D . Klingman (1978). Improved computer-based planning techniques, Part I. Interfaces, 8:4, 16-25. Glover, F . , D . Klingman and C. McMillan (1977). The N E T F O R M concept: A more effective model form and solution procedure for large scale nonlinear problems. Annual Proceedings of the ACM, October 16-19, 1977, 283-89. Greenberg, H . J. (1983). A functional description of ANALYZE: A computer-assisted analysis system for linear programming models. ACM Transactions on Mathematical Software, 9 : 1 , 18-56. Haverly System, Inc., (1977). MaGen: Reference manual, Denville, NJ. Hurrion, R. D . (1986). Visual interactive modeling. European Journal of Operational Research, 23:2, 281-287. Hurrion, R. D . (1980). Visual interactive (computer) solutions for the traveling salesman problem. Journal of the Operational Research Society, 3 1 , 537-539. Jones, C. (1988). User interfaces, Technical Reports. University of Pennsylvania, Philadelphia, PA. Jones, C. (1990). An introduction to graph-based modeling systems, Part I: Overview. ORSA Journal on Computing, 2:2, 136-151. Jones, C. (1991). An introduction to graph-based modeling systems, Part II: Graph grammars and the implementation. ORSA Journal on Computing, 3 : 3 , 180-206. Kennington, J. L. and R. V. Helgason (1980). Algorithms for network programming. John Wiley and Sons, New York, NY. Lembersky, M . R. and U. H. Chi (1984). Decision simulators speed implementation and improve operations. Interfaces, 27, 1-15. McBride, R. D . (1988). NETSYS - A generalized network modeling system, Technical Report. University of California, Los Angeles, CA. Ma, P . C , F . H. Murphy and E. A. Stohr (1989). A graphics interface for linear programming. Communications of the ACM, 32:8, 996-1012. Microsoft Corporation, (1989). Microsoft W I N D O W S . Redman, W A . Murphy, F . H . and E. A. Stohr (1986). An intelligent system for formulating linear programs. Decision Support Systems, 3. Murphy, F . H. (1988). A knowledgebase for formulating linear programs. In: Mathematical Models for Decision Support, (G. Mitra, Ed.), Springer-Verlag, New York, N Y , 451-470. Ornstein, R. E. (1973). The nature of human consciousness. Freeman Press, New York, NY. Patel, N . R. (1979). Locating rural social service centers in India. Management Science, 25:1,22-30. Paul, J. P . (1989). LINGO/PC: Modeling language for linear and integer programming. OR/MS Today, 16:2, 19-22. Stohr, E. A. (1988). Automated support for formulating linear programs. In Mathematical Models for Decision Support, (G. Mitra Ed.), Springer-Verlag, New York, N Y , 519538. The Whitewater Group, (1989). ACTOR: User's guide. Evanston, IL. Zenios, S. A. (1988). Integrating network optimization capabilities into a high level modeling language, Technical Report. University of Pennsylvania, Philadelphia, P A .

This page intentionally left blank

NETPAD: AN INTERACTIVE GRAPHICS SYSTEM FOR NETWORK MODELING AND OPTIMIZATION Nathaniel Dean, Monika Mevenkamp and Clyde L. Monma Bellcore, 445 South Street, Morristown, NJ 07962-1910

ABSTRACT The practical and theoretical importance of network models and algorithms is clearly documented in the literature. This has resulted in several recent efforts to develop systems for network modeling, algorithms and/or visualization. The main goal of this paper is to describe NETPAD, an interactive graphics system for network modeling and optimization. There were several factors motivating us while developing this system. First, networks are inherently visual; so an interactive graphics interface was considered to be a vital component of the overall design. Second, data is a very important part of network modeling; therefore, we have integrated network data attributes into the system. Third, algorithmic needs change over time to meet users' needs for additional functionality or performance, and to meet the needs of specific applications; so we have designed the system to be customizable and expandable. Fourth, widespread use of sophisticated methods requires ease-of-use in addition to functionality; so the system includes a menu-driven user interface, a standard file format and algorithm animation. NETPAD is a portable system written in the C programming language for workstations with the UNIX operating system and the X Window System. KEYWORDS User interface; graph algorithms; windows; graphics; networks; crossing number. OVERVIEW Networks are useful for modeling many practical situations, including physical networks such as ones representing communication or transportation networks, as well as abstract networks such as ones representing the scheduling of events or the allocation of resources. At the same time, research in graph theory and network algorithms have provided a wealth of tools for network analysis and optimization. The practical and theoretical importance of network models and algorithms is clearly documented in the literature. 231

232 There have been several recent efforts to develop systems for network modeling, algorithms and/or visualization. These efforts represent attempts to harness the considerable power of the available network technology into a system which is easy-to-use and meets the needs of certain communities of users. This has resulted in a proliferation of special-purpose systems and individual libraries of network algorithms. (See the section "Some Existing Systems" for a sampling of such efforts.) The main goal of this paper is to describe NETPAD, an interactive graphics system for network modeling and optimization. There were several factors motivating us while developing this system. First, networks are inherently visual; so an interactive graphics interface was considered to be a vital component of the overall design. Second, data is a very important part of network modeling; therefore, we have integrated network data attributes into the system. Third, algorithmic needs change over time to meet users' needs for additional functionality or performance, and to meet the needs of specific applications; so we have designed the system to be customizable and expandable. Fourth, widespread use of sophisticated methods requires ease-of-use in addition to functionality; so the system includes a menu-driven user interface, a standard file format and algorithm animation. NETPAD is an interactive graphics software system which provides an integrated environment for network modeling and optimization with features including: •

graphics for creating, displaying and manipulating networks



capabilities for entering, displaying and manipulating network data



standard format for network files



expandable library of network algorithms



facilities for algorithm animation

At one level, NETPAD functions like an "electronic pencil and notepad" using a mouse, menus and multiple windows to create, manipulate and save networks; it can also be used to obtain Postscript printer output. On another level, it functions like an elaborate "network calculator" for applying available functions and algorithms to process networks. It also functions like a "network workbench and toolkit" using a library of network algorithms which can be customized and expanded to provide for rapid prototyping for specific applications. This functionality results in an easy-to-use vehicle for harnessing the power of available network modeling and algorithmic tools. We use the term "graph" to represent a set of nodes together with a set of links, where each link consists of a pair of nodes. We use the term "attribute" to represent data values associated with the graph itself, its nodes or its links. Each (graph, node or link) attribute has a name, a data type (e.g., character, integer, float) associated with it, and data values (for each graph, node or link, respectively). We use the term "network" to represent a graph together with any number (possibly none) or types of attributes.

233 NETPAD ENVIRONMENT NETPAD is an interactive graphics software system which provides an integrated environment for network modeling and optimization. It is a portable system written in the C programming language for workstations with the UNIX operating system and the X window system; it is currently being used at Bellcore and by researchers at over 30 universities on SUN and DEC workstations, on VAX computers and on 386-based IBM-compatible personal computers. It consists of an interactive graphics, menu-driven user interface which can be easily customized to fit specific users or applications. It also consists of library of network algorithms which can be easily expanded to include new or existing algorithms. NETPAD utilizes a mouse-oriented, menu-driven user interface to provide for maximum easeof-use for most users. Experienced users can also make use of keyboard equivalents for all menu items in order to operate more quickly once familiarity with the system is gained. The interface is further enhanced by allowing the user to easily customize most aspects of the system, including the menus themselves, items within a menu, and keyboard equivalents, to provide a "look-and-feel" to fit a particular user or application. NETPAD allows multiple windows to be present at any time. Operations selected from a given window are applied to the network currently associated with that window. A window consists of four main components: the status area, menu area, button area and display area. Examples of typical windows are shown in Fig. 1.

Fig. 1. Examples of typical windows The status area is for displaying text information about current algorithm, including the name of the algorithm being executed, its type (e.g., internal or external) and its status (e.g., done). The menu area displays the heading titles for the menus. The user uses the mouse to pop-up a menu in order to select an item to execute. The button area is used to execute or kill a selected menu item, and to decide whether to show or hide parameters associated with an algorithm. The display area is for graphically displaying the network.

234 The number and names of the menu headings and the particular items within a menu are governed by a configuration file. NETPAD reads this file initially to set up the user's environment. The format of the configuration file consists of an entry for each menu starting with the keyword "MENU:" followed by the name of the menu (in quotes), and a list of menu items in brackets. Each menu item includes the name of the item (in quotes), the associated executable file in the NETPAD library, and the keyboard equivalent (in quotes). A sample configuration file is shown in Table 1. This particular configuration file would generate the menu areas for the windows shown in Fig. 1. Several of the menus are for functions normally associated with the NETPAD interface. For example, the Win menu entries are for window management, like opening a new window or quitting the current window; and the I/O window is for input/output functions like loading a network from a file, saving the current network to a file or printing the current network. The Graf, Node and Link menus are for performing operations on the entire graph, on its nodes or on its links, respectively. Typical operations include cut-and-paste; selection of nodes/links; operations on selected elements (e.g., deletion, change color/shape/style); changing defaults for node/link colors, node shapes (e.g., point, box, etc), and link styles (e.g., solid, dashed, directed). The Attr menu is used for handling attributes, including defining new attributes and setting attribute values. The Anim menu contains animation functions for initializing and playing back an animation file in several modes (e.g., run, step, next). The other menu items (we show here the Make, Algo, Geom and Nets menus) would normally be customized menus which would contain functions from the NETPAD library (called internal algorithms) or from a separate user program library (called external algorithms); these would be grouped into categories according to the user's preference as specified in the configuration file definition. There are many more internal and external algorithms which are omitted from discussion here for the sake of brevity. USING NETPAD To further clarify how the interface is used, we consider some specific examples. Selecting "Load" in the "I/O" menu has the effect shown in the first entry in Fig. 2. note that a parameter box appears to request the name of the file to load (in this case, planar.grf) and the network is now drawn in the display area with the information updated in the status area (namely, that the internal algorithm graf.load has completed). We could now select the "Planar Draw" item from the Algo menu to obtain a planar layout of the graph as shown in the second entry of Fig. 2. To further illustrate the notion of attributes, consider the first entry in Fig. 3. which shows a network with nodes labeled by city names and links labeled by distances. A network can have any number of attributes associated with the graph, its nodes or its links. Attributes can be defined and displayed as node or as link labels. The position of a node label can be chosen by selecting a clock position around a node. The second entry in Fig. 3 shows the result of executing the "Path Finder" algorithm from the Algo menu on this network; note that the shortest path between the selected cities of Phoenix and St. Paul is highlighted and the status area shows the path length.

235

Fig. 3. Examples of algorithm with attributes

236

Table 1: Example Configuration File MENU: "Win" { "Open" "Quit"

wm.open win.quit

"Wo" "Wk"

graf.load graf.save graf.print

"GL" "GS" "GP"

graf.select graf.cut graf.paste

"Gs" "Gy" "Gp"

nodes.select sel.node.delete nodes.color nodes.shape

"Ns" "Nd" "Nc" "Nt"

links.select sel.link.delete links.color links.style

"Ls" "Ld" "Lc" "Lt"

attr.define attr. set

"Ad" "As"

planar.alg path.alg

"Pf"

} MENU: "I/O" { "Load" "Save" "Print"

} MENU: "Make " { } MENU: "Graf" { "Select" "Cut" "Paste"

} MENU: "Node" { "Select" "Delete Sel" "Color" "Shape " } MENU: "Link" { "Select" "Delete Sel" "Color" "Style" } MENU: "Attr " { "Define Attr" "Set Attr Val" } MENU: "Algo" { "Planar Draw" "Path Finder" } MENU: "Geom " { } MENU: "Nets" { } MENU: "Anim" { }

"PI"

237 We also note that algorithms which provide more detailed outputs can make use of a text window. Attribute values can be modified for individual nodes or all links by using the Node or Link menus, respectively, or for all nodes or links by using the Attr menu. CUSTOMIZING NETPAD We have already explained how to customize the user interface using the configuration file. We now describe how to further customize the system by adding user-defined programs, called external algorithms. External algorithms are written as main programs which are linked to the NETPAD library and are separate programs residing in their own executable files. These algorithms access and manipulate networks by using functions provided by the NETPAD library. Each external algorithm has an algorithm specification file to identify its input requirements, or parameters, including their names, types and default values. This file is used by the NETPAD kernel so that the interface can automatically execute the algorithm and provide a parameter box (if one is required). Each source code file for an external algorithm must begin by including the NETPAD file containing all definitions of the NETPAD data types, constants and subroutines. The internal structure of the NETPAD data objects is hidden from the user for several reasons: (1) the user is not burdened with mastering them, (2) the user is protected from accidental modification, and (3) the system is protected from malicious modification. Pointers to these objects are used to communicate between programs and the system. There are NETPAD data types for three types of objects: graphs, nodes and links. An external algorithm accesses the data via function calls. These functions comprise the programmer interface and are the only means by which an external algorithm may interact with the system. To execute a function, the programmer generally passes a pointer to one of the three basic NETPAD data objects as input. The following paradigm is typical for external algorithms. 1. Accept the current network and parameters as input. 2. Use function calls to obtain the attribute values. 3. Place the data into private data structures. 4. Compute results using these private data structures. 5. Prepare the output graph, attributes and results. 6. Return the outputs to the system. 7. Exit Of course, it is not essential that the algorithm perform all of these steps. A programmer need not write, modify or re-compile any existing files in order to add a new program to the system. The user simply compiles the new program and links it with the system. Assuming that the user has a C program called pgm.c, for example, one simply needs to create a file called pgm.alg describing the parameters used as input to the program and add a line to the customization file so that the program will appear as a selection on the desired menu.

238

NETPAD automatically passes the current network and (if necessary) further parameters to algorithms. It allows the algorithms to return a modified network or to create and return a completely new network. Animation facilities are provided for user-defined programs, thus supporting the visualization and analysis of network algorithms. These animations are possible by either creating a text file of special commands which can be played back later like a movie or by accessing an available graphics library directly from the program. These features combine to make it easy for users to generate and study networks with specified properties, visualize relationships among networks and algorithms, and perform analysis and optimization for network-based problems. NETPAD can also be used to teach students about network properties and network algorithms and the user interface is flexible enough to allow a user to customize NETPAD for specific applications. NETPAD ARCHITECTURE The basic components of NETPAD are the kernel, the internal and external algorithms, the animation and graphics functions and the customization file. The kernel is the only component that we have not yet discussed in detail, and so we describe it now together with a brief look at the overall structure of NETPAD. The NETPAD kernel unites the other components together into one coherent system by providing an interactive environment for executing internal and external algorithms, animating algorithms and editing networks. In order to provide display and interactive editing capabilities, the kernel uses the X library, X toolkit intrinsics and the Athena Widget set, and it is linked with the Xaw, Xtk and X libraries. Using X, the kernel manages the screen in order to visually represent the user's current networks and to update this representation in order to show network changes which result directly from user actions and from algorithm executions initiated by the user. Following the design philosophy of the X Window System, the kernel uses the event model for handling user-computer interactions. (Other approaches involve the use of transition networks or context-free grammars and are inferior, see Green (1986).) In order to manage the display, the kernel must maintain its own data structures which must agree with the network and any other information being displayed. The primary data structure type of the kernel is called grafdsp, which stands for graph display. There is one grafdsp structure per network, and that structure contains pointers to the various display objects associated with that network, including to its window and menus, as well as to the network itself and to the algorithms which are accessible via the window's menus. For example, when a node is moved by the user, the new position is recorded in grafdsp. To execute an external algorithm, the kernel temporarily blocks the user's access to the network window, puts it into reverse video and forks a process for the algorithm. After the process is complete and exits, this is detected by a signal handler installed by the kernel which generates an event which contains information regarding the exit status of the algorithm and which network is associated with the algorithm. This then triggers another event to process the algorithm's results. More specifically, this last event will unblock the associated network window, put it back in normal video, display the result network in the same window, and (if appropriate)

239 display an error message or any text or numerical information returned by the algorithm. POTENTIAL USES OF NETPAD The potential uses of NETPAD fall roughly into three categories: •

research in network modeling and algorithms



rapid custom prototyping for specific applications



educational aspects of network modeling

We have used NETPAD ourselves in a number of research areas including graph theory, combinatorial optimization and network design, and have used the animation feature for studying network algorithms. For example, we were able to use NETPAD to develop a heuristic algorithm for embedding a graph in the plane to approximately minimize the number of crossing when edges are drawn as straight lines. This is known as the rectilinear crossing number problem and is NP-hard (Garey and Johnson, 1983). For complete graphs, the exact crossing number is known for n < 9 (Guy, 1971), but for n > 10 the classical construction of Jensen (1971) produces a layout which, in several cases, is not as good as the layouts generated by NETPAD (see Table 2). We are hopeful that by observing enough instances we will be able to derive a better general construction procedure. Table 2: Upper bounds for crossing number of complete graph on n nodes n

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Previous best 0 1 3 9 19 36 62 102 156 231 328 453 612 808 1047 1338

NETPAD 0 1 3 9 19 36 62 102 154 229 327 449 609 806 1019 1322

NETPAD provides a great deal of functionality which makes it possible to rapidly build a working prototype for specific applications. This has been done in several Bellcore projects ranging from obtaining automatic "nice" drawings of wiring diagrams, to developing tools for

240 managing computer networks, as well as on-going work for a network design and analysis tool for packet networks. NETPAD is also a useful educational tool for studying network modeling algorithms and applications. For professors teaching courses on algorithms, NETPAD could be used to explain concepts which might otherwise be difficult to comprehend. It could be used as part of a lab for experimentation or in conjunction with projects for students to gain hands-on experience with algorithms. The visual and interactive nature of NETPAD might also stir some enthusiasm in students who would otherwise have little or no interest in the subject. NETPAD is being used as part of the educational program at Rutgers University in conjunction with the DIMACS NSF Center for Discrete Mathematics and Theoretical Computer Science. This effort is aimed at high school level students and teachers to motivate students to pursue careers in mathematics and computer science. SOME EXISTING SYSTEMS This section contains a sampling of existing network modeling and analysis systems to illustrate a range of considerations that may be of interest to potential users of such systems. None of these systems contain the full range of features available in NETPAD. IBM PC-based

Programs

TRAVEL is a software package developed by Boyd, Pulleyblank and Cornuejols (1987) for the traveling salesman problem. It is an interactive, menu-driven system which runs on an IBM PC. The system allows the user to choose among various heuristic algorithms and lower-bounding procedures to obtain solutions with provable performance. Color graphics is used to display an animation of the algorithms as they are running. INDS (Interactive Network Design System) is a software package developed by Monma and Shallcross (1989a) for the two-connected network design problem, which arises in the design of survivable networks. It runs on an IBM PC, uses the TRAVEL user interface, and incorporates several heuristic algorithms (Monma and Shallcross, 1989b). FIBER OPTIONS is a software package developed by Cardwell, Monma and Wu (1989) specifically for designing survivable fiber optic networks. It uses the methods of INDS to design the network topology and uses other methods for handling aspects of the problems, like placing equipment, bundling demands and multiplexing traffic. These algorithms have been shown to produce near-optimal solutions to real-world problems. The user interface was written and designed so that it could run on several computing and graphics platforms. FIBER OPTIONS is used within Bellcore and the Bell Client Companies to plan cost-effective, survivable interoffice fiber optic communication networks. It is available to other organizations for outside licensing (Bellcore, 1989). CARDD (Computer-Aided Representative graph Determiner and Drawer) is an expert system that constructs a graph with properties defined by the user. It was developed by Haynes, Lawson and Powell (personal communication) and uses a forward chaining inference algorithm; i.e., once an invariant is resolved, it is never eliminated. The properties are specified by setting values for any subset of the available set of eight invariants: number of nodes, number of edges, maximum degree, minimum degree, independence number, maximum clique size, chromatic

241 number and domination number. It does not have a graphics interface. NETSOLVE is an interactive software package developed by Jarvis and Shier (1990) for network manipulation and optimization. It utilizes a command language rather than a menudriven interface and has a library of optimization algorithms. It runs on an IBM PC and does not use graphics. GRAPPLE is a Pascal program developed by Marc Lipman (personal communication) and students at Indiana University and Purdue University at Fort Wayne. It is a relatively small system which is mainly used for educational purposes and experimenting with graphs. CATbox (Combinatorial Algorithm Toolbox) is a software package that was developed at the Mathematical Institute of the University of Cologne (Bachem, 1989). It provides an interactive, comfortable environment for solving combinatorial optimization problems on an IBM Personal Computer or compatible. UNIX-Based

Programs

The GMP software system was developed by Esfahanian (personal communication with some of its users). It uses SUN's Sunwindows window system which gives it inherently less portable than systems using X Windows. The GraphPack software system was developed by Goldberg et al. (1991). It runs under X Windows and includes a language called LiLa (which is based on the C programming language with additional primitives like sets, graphs, trees, etc.) to simplify the coding of new algorithms. It also includes templates, i.e., unfinished programs for different algorithmic pardisms. It does not have a graphics interface. The Graph Lab project, directed by Shannon (1989) at Indiana University, is an integrated visual and textual system for displaying graphs and designing graph algorithms on the NeXT computer. The drawings are rendered in Postscript, and the textual interface is based on an object-oriented implementation of graphs in a LISP-like programming language called Chez Scheme. The Combinatorica system was developed by Skiena (1990) and is actually a collection of network algorithms written in Mathematica (which must be purchased separately) and runs on a variety of UNIX-based computers. Although it does not have a graphics interface, it provides a wide range of combinatorial functions which are not supported by other systems. Devitt and Colbourn (1991) have developed a system for investigating network reliability problems. It is an interactive, algebraic environment which provides a package of routines coded in the MAPLE language. It does not have a graphics interface. Network ASSISTANT has been developed by Bradley and Oliveira (personal communication) as a system of portable C program modules to support the construction of efficient graph and network algorithms. It does not have a graphics interface. The SetPlayer program developed by Berque et al. (1990), at Rensselaer Polytechnic Institute is an interactive, command-driven system for computing with sets. It uses special data structures to enable efficient set manipulations and computations independent of whether the sets are

242 represented explicitly or symbolically. The system is written is C, and two versions are available: a textual version and a graphical version that runs under X Windows. Macintosh-Based

Programs

Three versions of a program called CABRI are mentioned in (Dao et al. 1991), one running on a Macintosh, another on a PC-compatible, and a third version for workstations that uses the BWE window management toolset from Brown University. (Only the Macintosh version was available to us.) It contains several network editing and analysis functions. Groups & Graphs (Kocay, 1988) is a program for manipulating graphs and groups. It contains various group theoretic algorithms, such as computing the automorphism group of a graph and determining whether two graphs are isomorphic. It does not have a graphics interface. NETPAD AVAILABILITY A version of the NETPAD software is currently being used within Bellcore. This software is a research prototype system which is under constant development. In addition, several documents are available which provide much more detailed information about the NETPAD system, including a User's Guide (Dean et al., 1991), a Programmer's Guide (Mevenkamp, 1991a) and a Reference Guide (Mevenkamp, 1991b). NETPAD was designed to run in a workstation environment with the Unix operating system and under the X Window System. It is currently running on SUN and DEC workstations, VAX computers and on 386-based IBM-compatible personal computers. To take full advantage of NETPAD, it is necessary to have adequate processing power (e.g., comparable to the machines cited above), memory (e.g., 8MB RAM), disk space (e.g., 100MB hard disk) and display technology (e.g., a large screen and color are useful). The NETPAD software and documentation are available on a royalty-free "as is" basis to universities for research, educational or academic purposes under a Software License Agreement for a small service fee to cover the costs of supplying the software and documents. ACKNOWLEDGMENT We wish to thank Jed Schwartz who contributed much to the early development of NETPAD. D. Berque, R. Cecchini, M. Goldberg and R. Rivenburgh (1990). The SetPlayer System: An overview and user manual. Technical Report 90-25, Computer Science Dept., Rensselar Polytechnic Institute. S. C. Boyd, W. R. Pulleyblank and G. Cornuejols (1987). TRAVEL-AN interactive traveling salesman package for the IBM personal computer. Operations Research Letters 6,141-143. Bellcore (1989). FIBER OPTIONS: Software for designing survivable optical fiber networks. Bellcore. B. Birgisson andG. Shannon (December 1989). Graphview: An extensible interactive platform for manipulating and displaying graphs. Technical Report 29S, Computer Science Dept., Indiana Univ. R. H. Card well, C. L. Monma and T. H Wu (1989). Computer-aided design procedures

243 for survivable fiber optic networks IEEE Journal on Selected Areas of Communications 7, 1188-1197. J. S. Devitt and C. J. Colbourn (1991). On implementing an environment for investigating network reliability. Technical Report, University of Waterloo. M. Dao, M. Habib, J. P. Richard and D. Tallot (1991). CABRI, an interactive system for graph manipulation. Technical Report, Centre National de la Recherche Scientifique. J. P. Jarvis and D. R. Shier (1990). NETSOLVE: Interactive software for network optimization. Operations Research Letters 9, 275-282. N. Dean, C. L. Monma and M. Mevenkamp (1991). NETPAD User's Guide. Technical Report. Bellcore. M. R. Garey and D. S. Johnson (1983). Crossing number is NP-complete. SIAM Journal on Algorithms and Discrete Methods 4, 312-316. M. Goldberg, E. Kaltofen, S. Kim, M. Krishnamoorthy and T. Spencer (1991). GraphPack: A software system for computations on graphs and sets. Technical Report. Computer Science Dept., Rensselar Polytechnic Institute. M. Green (1986). A survey of three dialog models. ACM Transactions on Graphics 5,244-275. R. K. Guy (1971). Latest results on crossing numbers. Recent Trends in Graph Theory. Springer, New York 143-156. F. Harary and A. Hill (1962). On the number of crossings in a complete graph. Proc. Edinburgh Math. Soc. 13, 333-338. H. F. Jensen (1971). An upper bound for the rectilinear crossing number of the complete graph. Journal of Combinatorial Theory 11,212-216. W. Kocay (1988). Groups & graphs, a Macintosh application for Graph Theory. JCMCC 3, 195-206. M. Mevenkamp (1991a). NETPAD Programmer's Guide. Technical Report. Bellcore. M. Mevenkamp (1991b). NETPAD Reference Guide. Technical Report. Bellcore. C. L. Monma and D. F. Shallcross (1989a). A PC-based Interactive Network Design System for fiber optic communication networks. In: Impacts of Recent Computer Advances on Operations Research, (Balci and Stewart, ed.). Sharda, Golden, Wasil, Elsevier, New York. C. L. Monma and D. F. Shallcross (1989b). Methods for designing communications networks with certain two-connected survivability constraints. Operations Research 37,531-541. S. S. Skiena (1990). Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Addison-Wesley.

This page intentionally left blank

*********

This page intentionally left blank

A CONCURRENT COMPUTING ALGORITHM FOR REAL-TIME DECISION MAKING

1

WAYNE J. DAVIS Department of General Engineering, University of Illinois at Urbana-Champaign, 104 S. Mathews Ave., Urbana, Illinois 61801 e-mail: [email protected]

ABSTRACT This paper addresses the real-time decision making associated with and the control of discreteevent systems. The primary characteristics of real-time decision making are first described, and the real-time production scheduling problem for a flexible manufacturing system is cited as a specific example. A real-time decision-making and control framework is then introduced and is shown to consist of the four concurrent functions: problem assessment, performance improvement, control execution, and system monitoring. It is then demonstrated that each of these four basic functions is also comprised of concurrent computing processes. One potential conceptualization of the framework is then discussed, defining the essential information flows among the four primary functions as well as their component computing processes. The essential role of real-time, discrete-event simulation within the proposed framework is illustrated. Additional issues, including system stability and hierarchical coordination, are discussed. The paper concludes with the steps that have been taken to implement the proposed decision-making/control architecture. KEYWORDS Real-time decision making, real-time discrete-event simulation, discrete-event control systems, production scheduling, stochastic decision making, concurrent processing, flexible manufacturing INTRODUCTION A new class of real-time, decision-making problems has emerged which significantly differs from the classical decisions previously addressed in operations research. Classical decisionmaking has focused upon the selection of an optimum course of action with little consideration of the decision's eventual implementation. That is, the eventual controlling actions required to implement the decision were never formulated within the decision-making process. Under the real-time decision-making scenario, the current decision is dependent upon the current state of

1 This research was supported in part by the National Science Foundation Grant DMC 87-06201 and University of Illinois Manufacturing Research Center Grants from Caterpillar, Inc. and Motorola, Inc.

247

248 the system and that system state is in turn dependent upon the controlling actions employed to govern the system dynamics. The inevitable consequence is that both decision making and control must be considered concurrently in real time. The dependence upon the system state is only one contributing factor to the dynamic nature of the real-time decision. The considered decision typically represents a subproblem within a much larger organizational problem. As the coordinating inputs from the decision maker's supervisor are modified, so is the decision. Since we are addressing the evolution of a dynamic system over time, the planning horizon to be considered within the decision must also be dynamic. Given the resulting dynamic nature of the decision and the subordinate relationship of the considered decision to the global organizational problem for the hierarchical system, the continued verification of the optimality for the selected course of action is impossible to assert. Moreover, the considered systems are typically stochastic and require the consideration of more than one performance criteria. These features even further complicate the assertion of optimality. Without an ability to assert optimality, the question arises pertaining to the role that the decision maker must fulfill. As described above, there is also a constant need to specify the controlling actions to be executed by the considered controller, and the control law that generates these controlling actions must be concurrently specified by the decision maker. The current control law necessarily provides a base-line system performance from which an improved control law is continuously sought. Therefore, it is perhaps more appropriate to view the decisionmaking role as one of continuous performance improvement rather than optimization. Several potential applications for real-time decision-making analysis can be cited. Davis and Jones (1988) defined the real-time production scheduling problem (PSP) as it applied to the scheduling of a flexible manufacturing system (FMS). Similar applications also arise in the scheduling of air traffic at an airport or the control of vehicular traffic on a congested highway. The evolution of these cited systems can generally be characterized as being discrete event in nature. Cao and Ho (1990) have presented a survey of a number of models that can be adopted for Discrete-Event Dynamic Systems (DEDS) while Ho (1989) reviews current research efforts with several specific modeling techniques, including Markov processes, Petri networks, queuing models, finite automata, and discrete-event simulation. In general, these modeling methods are directed toward the analysis of DEDS and are neither decision-making nor control algorithms per se. Though some of these modeling approaches do permit the consideration of stochastic phenomena, such analyses primarily address statistical consequences as long-term trends. They do not address the transient response of the DEDS which is the true concern of real-time decision making. The literature addressing real-time decision-making is limited. Recently, Davis, Jones, and Saleh (1991) generalized the real-time production scheduling algorithm presented by Davis and Jones (1988) to define a generic controller for a more general class of DEDS. Their conceptualization of the generic controller relied heavily upon the technology of real-time, discrete-event simulation as discussed in Davis, Wang, and Hsieh (1991) to provide a statistical estimate of the future transient response of stochastic DEDS with respect to a prespecified set of multiple performance criteria. The generic controller proposed by Davis, Jones and Saleh (1991) and discussed here is but one possible framework for the concurrent integration of decision making and control in real time. Undoubtedly, the future will provide other alternatives. Since there is little theoretical guidance for this development, experimentation is essential. This paper will discuss a conceptualization and partial implementation of the generic controller as it is applied to the realtime PSP for a simple FMS. This conceptualization will first characterize the types of constraints considered within the real-time decision as being either endogenous or exogenous constraints, and assign their consideration to the appropriate control function. It will assume that multiple performance criteria are to be considered and that the FMS is a stochastic entity. The presentation also addresses several control concerns including hierarchical coordination and system stability. This paper begins by defining the real-time PSP and then introduces real-time, discrete-event simulation technology that is responsible for predicting the transient response of the controlled DEDS operating under a specified control policy. Next, a concise overview of the generic con-

249 trailer as defined by Davis, Jones, and Saleh (1991) is provided. It will be shown that the generic controller addresses four distinct functions concurrently. For each individual function, a concurrent computing algorithm is then defined. As the four concurrent control functions are assembled to form the generic controller, a description of the essential communication required to coordinate the functions is provided. Finally, the conclusion briefly discusses the development issues that must be addressed to fully implement the generic controller in a manufacturing setting. THE PRODUCTION SCHEDULING PROBLEM In this section, two primary tasks will be addressed. First, the real-time PSP for the FMS is defined. This definition will discuss both the discrete events and the state transitions required to characterize the FMS's response, A discussion of real-time, discrete-event simulation, which represents the core technology for implementing the proposed real-time, decision-making/control framework, then follows naturally. The PSP is a dynamic entity. First, it is dependent upon the current ensemble of jobs that have been assigned by the scheduler's supervisor for production within the FMS. Secondly, the scheduling problem is dependent upon the current state of the system, including the current status of the manufacturing processes and the other supporting resources. An additional complicating feature to the real-time PSP is that the evolution of the FMS is itself nondeterministic. This stochastic behavior has several sources. First, there is always the potential for an inaccurate system model. The modeling of the detailed interactions among various equipment controllers is difficult. And in some cases, the modeling is impossible when the vendor neglects to provide documentation for its controlling software. In other cases, details are purposely neglected in the modeling. For example, the information exchanges between the process controllers and the cell controller, as well as the extraction of information from remote databases, is seldom considered in the current modeling of FMSs. These transactions can contribute to nondeterministic processing times. Second, the outcomes of the manufacturing processes themselves are not deterministic, both in their duration and final outcome. Quality control remains a concern in nearly every manufacturing setting. In addition, operations performed by humans, such as fixturing, are not deterministic in duration. Moreover, the PSP typically requires the simultaneous consideration of multiple performance criteria. These criteria may include lateness with respect to the assigned due dates, productivity considerations such as process utilization, and goals such as minimization of work-in-progress inventory. It is generally impossible to simultaneously improve all performance criteria. Characterization

of the System Dynamics for Modelled FMS

The first step in characterizing the behavior of the FMS is the definition of its system state. Using the state definition, a subsequent specification of the state transition mechanisms can be made. The state transition mechanisms, in turn, define the most fundamental constraints for the PSP. The evolution of the FMS's response is characterized by discrete events. When a discrete event occurs, the state of the system will be changed via its state transition functions, and subsequent events will be scheduled. This observation permits the construction of discrete-event simulation models for FMSs as discussed in Ho (1989). The structure of the FMS to be considered here is depicted in Figure 1. The FMS consists of N primary processing resources, denoted by R in Figure 1. Each processing resource has an inn put and an output queue. From a controller's point of view, only the input queue belongs to the R who regulates the inflow of JOBs by controlling its inhibit flag. The output queue belongs n to the FMS controller which has been termed the generic scheduler (GS). The GS can regulate the outflow of jobs by controlling the inhibit flag for each output queue at each processing resource. JOBs are introduced into and removed from the FMS through a top-level process, pro-

250

process state information PROCESSING RESOURCES



*i

>





RN

PROCESS INPUT PORT INHIBIT FLAG TT SUB-UNIT OUTPUT QUEUE

'

GENERIC SCHEDULER

sub-unit directives with priorities transport commands



T

*0

MATERIAL HANDLING SYSTEM

is

EXECUTION FUNCTION

T OPTIMIZATION FUNCTION

TT

I O

ASSESSMENT FUNCTION

INHIBIT (OUTPUT PORT FLAG

PROCESS 0

assigned tasks and due dates

unit state information

Figure 1-Schematic Diagram for the Considered Flexible Manufacturing System cess 0, which again possesses an input and output queue. Only JOBs residing in the input queue at process 0 are under the direct control of the GS. JOBs that reside in the output queue at process 0 are under the control of the supervisor to the FMS (e.g., the Master Production Scheduler in the computer-integrated manufacturing hierarchy) since their processing within the FMS has been completed. Finally, a material handling system (MHS) has been included to transport JOBs among the processing resources. The control of the MHS is coordinated through the GS. The first step in defining the state of the FMS is to assume that the FMS can be employed to perform processing tasks upon M distinct product types, denoted by Z m(m=l,...,M). For each product type Zm, it will be further assumed that there are z mdistinct process plans, denoted by Cmp (p=l....>Zm). These plans dictate how the processing resources contained within the FMS can be employed to produce the given product. Each processing plan must define the order in which the processing resources will be employed (i.e. the precedence relationships) and a specification of a probability density function f m n(>T m. n' Cmp) where T mnis> the time required to implement the selected processing task upon product Z mat resource R ngiven that process plan Cmp has been selected. If a given processing time is deterministic then the f m , n ( T mnI?Cmp) will be a Dirac delta function at the appropriate deterministic processing time of T m . > n For the current scheduling problem, it is assumed that jobs JOBj (j=J m i n,..., Jmax) have currently been assigned to the FMS for processing. Each JOBj will further request the production

251 #(JOBj) number parts of a singular product type from the product set Z (m=l,...,M). An upm per limit upon #(JOBj) is specified as the maximum number of the requested part type Z that m can either fit on the material handler or a fixture. If a given order requests more parts than this imposed upper bound, then multiple JOBs will be issued until the quantity of parts requested by the order is satisfied. Each job will enter and exit the FMS through the process 0. The GS first must select the process plan £ mp to be employed in the production of JOBj which necessarily determines the sequence of subsequent process visitations for JOBj. That is, JOBj will depart from process 0 and move to the first processing resource R specified in the processing plan £ . It will then n m p continue to move to the next processing resource and so forth until JOBj eventually finishes processing and returns to process 0. Note that a given process may be visited several times under the processing plan £ p . For example, several fixturings may be required to implement the m processing plan £ .

m p

Associated with the movement of JOBj through the FMS, is a sequence of events that will occur at each visitation of a processing resource R . At a primary processing resources R n n (n=l,...,N), the following events can be defined: Aj

n The time JOBj arrived at the processing resource R n, n The time JOBj initiated processing at the processing resource R n, Fj n The time JOBj finished processing at the processing resource R n, Pjn The time JOBj departed from the processing resource R , and n Dj assigned due date for completing the specified processing step upon JOBj at the n The processing resource R . n Sj

For the process 0 (the input/output port to the FMS), the following definitions hold: AJO

The time JOBj initially arrives at process 0,

SJO

The time JOBj is first dispatched from process 0 to be transferred to a subordinate processing resource R ,

FJO

The time JOBj returns to process 0 from the subordinate processing resource R n (n=l,...,N) as a completed job,

PjO

The time the JOBj is picked up from the FMS, and

DJO

The assigned due date for JOBj to complete processing within the FMS.

n

Whenever the JOBj enters the system at process 0, the state of JOBj, Xj, is initially defined as X j = {(Ajo,Sjo,Fjo,Pjo,D )}

j0

(1)

The event AJO is set to the time that JOBj arrived at process 0 while DJO will be established by the supervisor to the FMS. Please note that the due date D i p should not be confused with the due date requested by the customer for the delivery of the final product. The due date DJO is specified by the supervisor to the FMS as the point in time that the supervisor desires to regain control of the JOBj. In a similar manner, the GS specifies a due date D j for each JOBj at each n processing resource R that will be visited along the selected path £ . This due date D j simin m p n larly represents the time the GS desires to regain control of the JOBj from the subordinate processing resource R . As JOBj moves through the FMS, the control of the JOBj is exchanged n among the various components of the FMS as specified by a predefined set of rules or chain of command. This chain of command is even more essential when one considers a multi-level production scheduling hierarchy as suggested in Davis and Jones (1992) and Tirpak et al.

252

The other events are also critical to governing this exchange of control. The event Aj„ represents the point when the JOBj arrives at the processing resource R and coincidentally, the time n when the control of JOBj is assigned to the processing resource R . The event F j not only repn n resents the finish time for JOB,- at resource R , but also the time that the control of JOBj is ren turned to the GS. Recall that the output queue to each processing resource R is owned by the n GS. The fact that the control as well as the physical JOBj can reside with a subordinate process imposes crucial design constraints that must be considered in the specification of the framework for the GS. At each arrival of JOBj at a subordinate processing resource R , the state of JOB; is updated by n appending the new event subset as illustrated below

x

new

oId

|

() . nA > . ns> {j F n X

)

}

D

j)

n

Ajn is then immediately equated to the time that the arrival event occurred, and D j is established n by the G S . The remaining events are left unspecified until they actually occur. The physical transfer of the JOBj from one process to another is controlled by M H S under the coordination of the G S . Therefore, the M H S controller is responsible for the occurrence of the arrival and pickup events at processing resources R (n=l,...,N) as well as the dispatch event SJO and the n job completion event FJO. The G S implicitly controls the M H S controller by specifying the control rule to be employed in deciding the order in which pending material handling requests will be processed. At the subordinate processing resources R (n=l,...,N), the order for the n processing of pending JOBs is based upon the assigned due date D j which is again specified n by the G S . Therefore, the G S ' s production scheduling requirements can be summarized as follows: 1)

Specifying the processing plan £ mp that will be employed in the production of each JOBj,

2)

Specifying the rule that the MHS will employ to order the processing of pending material handling requests, and

3)

Specifying the due dates for the completion of the assigned processing task at the subordinate processing resources R (n=l,...,N).

4)

Specify the order in which JOBs pending dispatch in the input queue at process 0 will be dispatched.

n

The state of any processing resource Xn (n=0,...,N) can now be defined as the composite vector, consisting of the state of the JOB vectors for the JOBs that reside at the given process R . n For process 0, this list will include all the JOBs that currently reside within the FMS. For the other processing resources, R (n=l,...,N), the sublist of JOBs residing at the process in quesn tion will be the set of JOBs such that A j has occurred and P j is still undefined. Due to space n n limitations, a detailed description of the state of the processes cannot be provided, but the reader is referred to Davis and Jones (1992) where a detailed definition of the state of the processing resources as well as the MHS is provided. Additionally, this paper provides a detailed description of the state transition mechanisms associated with the dynamics of the FMS operating under the proposed state definition. For example, given that the start event S j has occurred, the finn ish event F j can be scheduled by sampling a processing time T n mnfrom > the probability distribution f (T I Cmp) which is conditioned upon the fact that the processing plan £ mn> mn > mphas been selected for the production of JOBj in the FMS. Although space limitations will not permit a complete definition of the state transidon mechanisms to be provided, it is an accepted fact that we can develop a discrete-event simulation to predict the evolution of the system operating under the defined set of events.

253 Real-Time,

Discrete-Event

Simulation

The technology of real-time, discrete-event simulation is a vital element of the proposed framework for the GS. In Davis, Wang and Hsieh (1991), a detailed description of the computing processes as well as statistical concerns associated with real-time, discrete-event simulation is provided. Discrete-event simulation will be employed here to predict the transient performance of the FMS given its current state and the scheduling decisions that have been made by the GS. Since the system dynamics for the considered FMS are stochastic, a statistical estimation of the transient performance is essential. The consideration of transient performance distinguishes the real-time simulation from most of the previous applications of discrete-event simulation. That is, previous analyses of DEDS using discrete-event simulation have primarily assessed the steady-state performance of the DEDS. In many (if not most) cases, these analyses have taken explicit precautions to remove the dependence upon the initial state from the prediction of the steady-state performance. Therefore, the objectives for the real-time simulation analyses are truly distinct from those of a steady-state performance analysis. In assessing the transient performance of the FMS using real-time, discrete-event simulation, multiple simulation trials will be performed with each trial k being initialized to the most current state of the FMS. The output of the simulation trial k will be the final state description,

X™,

(j=Jmin» • • • Jmax) j=Jmn i»—» Jmax)

for each JOBj considered during the simulation trial k. The length of each simulation trial is specified by the user to be either the set of JOBs currently residing within the FMS (i.e. or an expanded set of JOBs with random arrivals occurring beyond the current set of JOBs residing in the FMS. Given the final state description from each simulation trial k, a multitude of potential performance indices can be evaluated as described below:

th

Average T i m e a J o b Resides in Cell on k

k

AvgTIS =2J

k

Simulation Trial, A v g T I S

nsjF/(j£„-4j =2 (F^o-A^ )/(jLx-4in)

(3)

0

j—Jmin

j—Jmin

where TlSJf is the recorded time that JOBj resided in the FMS on trial k. Please note that the events in this and the following equations have a superscript k included to indicate that they were predicted using simulation on trial k rather than being recorded at the actual time that they occurred. th k Average Productivity for Jobs Processed on k

Simulation Trial, A v g P r

where ( F j - Sj ) is the recorded processing th time for JOBj at processing resource k R on trial k.

n

n

N Average Process Utilization on the k N

AvgPUk£PUJ5}forall5

(8)

This is an extremely complex computation which is discussed in Davis, Wang and Hsieh (1991). Currently, the probability is being computed using yet another set of Monte Carlo sampling techniques. Given that there are 1+1 scheduling alternatives (including C*) and L performance criteria to be considered, there are a total of I«(I+l)«L/2 pair-wise dominance probabilities to be computed in real time. Finally, the most startling observation made to date is that the manner in which a given performance index is optimized can change with time. In Figure 4, the graphic display for the performance of criteria AvgTIS [equation (3)] and AvgPU [equation (5)] from another example FMS is provided. Looking at the ecdfs for AvgTIS, there is apparently little differentiation in operating under control alternative C 1versus C 2. Looking at the ecdf for the AvgPU, however, there is considerable differentiation between C 1 and C 2. AvgPU is a criterion that we would typically maximize. In this case, C 2clearly dominates C 1. However, the graphic is telling us that we are generating similar throughput in terms of the AvgTIS with less utilization of the processes under C 1. Such a situation could arise if C2 was making less than ideal choices for the processing plans to be implemented. In this situation, it becomes apparent that we should actually minimize AvgPU to save the utilization of our processes and adopt C 1. Each of these three observations which have been derived from actual experiments clearly demonstrates the complexity of the real-time compromise analysis. Intelligent interfaces must be derived to assist the decision-maker in this process. Furthermore, both the statistical analysis module and the compromise analysis module truly present complex concurrent computing processes which again must be addressed in real time.

262 The Execution Function

(EF)

The EF is responsible for implementing the currently selected control law C*. The heart of the EF, as illustrated in Figure 2, is again another real-time simulation. However, this real-time simulation differs from that of the PIF in that it directly incorporates the statistical estimates of completion times for the currently assigned tasks provided as feedback information from the controllers of the subordinate processing resources R (n=l,...,N). Using this statistical feedn back information in conjunction with C*, the due date computation module provides an updated estimate of the completion times for all future processing tasks to be assigned to each processing resource. In performing this function, it operates in a manner similar to the statistical analysis module for the PIF. That is, the output for each simulation trial is passed to the due date computation module which generates an empirical cumulative probability density function for the completion times of each future task. The actual assignment of a due date for the completion of the task, Djn, is made when the JOB; arrives at process R . At this moment, A; is equated to n n the arrival time and Djn is established by EF of the GS. In truth, the EF can modify D j for any n pending JOBj at any R . That is, for any task not yet initiated, there is the flexibility of reasn signment of aue dates to modify the order that JOBs will be processed. The controller for the processing resource R , on the other hand, continuously updates its statistical estimates of the n completion times for the tasks awaiting processing in its input queue. These modifications are returned to the real-time simulator at the EF which can subsequently modify the Dj 's associated n with other pending tasks at the processing resources. The value assigned for the due date D j is dependent both upon the time that the JOBj will have n to wait at the processing resource n and the subsequent processing duration which is governed by the probability density function f ,n (Tm.n I Cmp) where T n is the time required to implem m) ment the selected processing task upon product type Z at resource R given that process plan m n Cmp has been selected. In Figure 5, the criticality of a given processing resource n is assessed by providing the cumulative density function for the anticipated wait times at various potential future arrival times. These real-time empirical distributions are derived by the process criticality computation module using the simulation trials generated by the E F s simulation engine. Using Figure 5, if a new JOB arrived at t o , the probability that it could secure the resource n in x time units is p o - Similarly, if the arrival occurred at t i , the probability of securing the resource n in x time units would be p i , and so forth. Since for a given wait time x and the series of arrival times t o , ti, t 2 , and 13, the associated series of probabilities p o , p i , P2> and P 3 is monotonically decreasing, the resource n being considered in Figure 5 is becoming more critical with time. That is, the later that the JOB arrives, the longer it is likely to wait to secure the resource. Typically, D ; is assigned such that processing resource n has a prespecified probability for n completing the task by the prescribed due date D j . The probability of completing the desired n task by D j can be computed by first estimating the arrival time of JOBj at resource n which then n defines the governing wait time distribution. The governing wait time distribution in conjunction with f T . • Cmp) can then be used to determine the requested completion time probamn(> mn bility. This computation is performed by the due date computation module. The desired completion time probability employed in the assignment of a due date may be a constant or may be based upon the process's current criticality. As stated above, this due date is essential to the interaction of the GS with the subordinate process as the assigned due date specifies the order in which the JOB will eventually be processed at the resource n. It is also essential to note that the EF can never force an infeasible solution upon the subordinate conn-oiler of the processing resource. It is presumed that the processing resource controller will attempt to meet the imposed due dates to the greatest extent possible. Its feedback of projected completion times for pending tasks presumes that the tasks must be successfully implemented. Therefore, the EF determines the consequences of the feedback information and the appropriate updates are made for completion times of future tasks. The primary concern of the EF is to implement the C* in such a manner that a feasible decision is maintained with respect to the endogenous constraints. The EF cannot explicitly guarantee the simultaneous satisfaction of all

263

the exogenous constraints. Originally when C* was generated by the PIF, the exogenous constraints were considered. However, as system state updates are realized by the processing resource controllers, the ability of the current control law C* to provide a feasible solution with respect to the exogenous constraints may no longer be possible. To prevent a systematic infeasible solution from being requested of the processing resources, the EF concentrates on the satisfaction of the endogenous constraints. The monitoring of the satisfaction of the exogenous constraints is conducted by the monitoring function. The Monitoring Function

(MF)

When the AF originally defined the exogenous constraints, it provided acceptable tolerances for their satisfaction to both the PIF and MF. From the EF, the MF receives continuously-updated, projected times for the completion of all future tasks as well as the current state of the system. Using this information, the MF continuously checks each exogenous constraint to see if the tolerances for its satisfaction have been violated. If the MF detects that one or more exogenous constraints are violated, it requests the PIF to search for a new control law that will restore feasibility with respect to the exogenous constraints. This situation is similar to a decision point which was mentioned earlier. It was also noted that the proposed implementation of the PIF would not employ decision points. Rather it was proposed that the PIF would continue to investigate new scheduling alternatives throughout GS's operation. Therefore, the M F s request to reinitialize the PIFs search is nonessential. In actuality, the MF is notifying the PIF that it is expecting the PIF to select a new control law which will restore feasibility. If a new feasible control law C* can be generated, then it is passed to the EF for immediate implementation, and the MF's request is fulfilled. If the PIF cannot provide a new law that will restore feasibility, the MF then triggers the AF to redefine the exogenous constraints such that a feasible solution

264 can be generated. This task typically requires that the AF negotiate a new final state (due dates for the assigned JOBs) with the FMS's supervisor. This updated final state is then passed to the PIF who defines a new control law to restore feasibility. The new control law is then passed to the EF for implementation. This exercise in the restoration of feasibility clearly illustrates the interactions that must occur among the various control functions and the role that the MF plays in coordinating these interactions. The Assessment Function Revisited. To modify the exogenous constraints, the AF typically must interact with the supervisor for the FMS. Like the EF for the GS, the supervisor to the FMS cannot force an infeasible solution upon its subordinates including the considered GS. To determine the completion dates for restoring feasibility of the production of the JOBs currently assigned to the FMS, the AF employs the currently estimated task completion time statistics for the pending future tasks (as provided from the EF). The supervisor is expected to respond to the requests to modify completion dates which will in turn modify the exogenous constraints and allow a feasible schedule to be generated. It should be further noted that the GS makes every attempt to restore feasibility locally before a request to modify completion dates is made to its supervisor. This is a crucial consideration in system stability: to respond to deviations at the lowest possible control level. If the supervisor must modify the completion dates, it is expected that this action will result in modification of other elements of its decision. Therefore, the disruptions will likely be propagated to other decision-making/control elements within the global manufacturing system. If the disruptions can be addressed locally by the GS, the subsequent propagation of disruptions will be minimized. One additional point about the AF to be discussed is its determination of which processing plan Cmp should be employed to produce a given product. In this determination, the AF employs the criticality data generated by the EF. Each processing plan £ mpspecifies both the processes that must be visited as well as the processing duration that will be required at each process. The processing criticality data (see Figure 5) provides an estimate of the time that a given JOB must wait to gain access to a given process as a function of time. Using these two sets of data, approximate simulations can be developed to statistically quantify the probability of completing a given JOBj by a given time under the processing plan £ . These simulations are performed in m o the completion date computation module of the AF. Before the supervisor to the FMS issues a JOB with a completion date, it queries the AF for statistical estimates of the potential completion date given the current state of the FMS. The AF generates these estimates which are returned to the supervisor. The supervisor then decides whether to issue the JOBj with its assigned completion date. In this manner, the feasibility of a proposed completion date while operating under the current C* is assessed before the JOBj is issued. The proposed completion date is then incorporated into the set of exogenous constraints by the formulation of exogenous constraints module within the FMS. The introduction of the new constraint will subsequently require the PIF to readdress the current scheduling problem. Recall that the estimates for the completion were made using the criticality data provided by the EF while implementing the current C*. Therefore, the current C* should provide a feasible schedule with respect to the new completion date constraint. The question that the PIF must then explore is the existence of a potentially better C* for implementation. CONCLUSIONS AND FUTURE DIRECTIONS This paper demonstrates that the real-time decision making and control associated with a stochastic DEDS is indeed a major departure from the classic decision-making frameworks employed in operations research. Specifically, the four distinct functions of assessment, performance improvement, execution and monitoring have been defined for concurrent implementation. This paper has also highlighted the criticality of the emerging technology of real-time, discrete-event simulation. Included in this discussion have been many new decision-making concerns, including that of system stability and hierarchical coordination.

265 Nevertheless, this is but one potential realization of a real-time, decision-making algorithm. Certainly, others will follow. Moreover, other new decision-making concerns will arise which will force the above formulation to be modified. In particular, a greater effort must be dedicated to the development of improved interfaces for the GS with the subordinate process controllers and the FMS supervisor. As other applications are addressed, we expect that additional functional requirements for the controlling elements and their interfaces will be discovered. As stated in the introduction, there is currently little theoretical guidance pertaining to the development of real-time decision-making algorithms. Experimentation is essential. Currently, most of the functionality of the PIF has been coded in object-oriented C++. A major research effort is now being devoted to the development of the compromise analysis module. The programming of the other real-time, decision-making functions is also being addressed. As a result of this effort, many of the computing objects developed for the implementation of the PIF will be modified and applied in the implementation of the other control functions. However, the implementation of the entire real-time, decision-making algorithm represents an enormous task. Much research and coding remains. It is hoped that this paper will excite others to consider this difficult, but critical, problem. ACKNOWLEDGEMENTS The author would like to acknowledge, Professor S. Daniel Thompson of the Department of General Engineering at the University of Illinois @ Urbana-Champaign for his assistance in preparing this paper and Dr. Sam Daniel and his associates of Motorola, Inc. in Scottsdale, AZ who contributed to the concepts depicted in Figure 1. REFEENCES Cao, X.R. and Y.C. Ho (1990). Models of Discrete Event Dynamic Systems. IEEE Systems Magazine, June, 69-76.

Control

Davis, W.J. and A.T. Jones (1988). A Real-Time Production Scheduler for a Stochastic Manufacturing Environment. Intl. J. of Computer Integrated Manufacturing, 1(2), 101-112. Davis, W.J., A.T. Jones and A. Saleh (1991). A Generic Architecture for Intelligent Control Systems. To appear in Intl. J. of Computer Integrated Manufacturing Systems. Also published as National Institute of Standards and Technology Report NISTIR 4521, Gaithersburg, MD, February 1991. Davis, W.J. and A.T. Jones (1992). The Application of a Generic Controller in a Multi-Level Production Scheduling and Control Hierarchy. Submitted to IEEE Trans, on Systems, Man, and Cybernetics. Also published as Manufacturing Systems Laboratory Report MSL-9110201, Department of General Engineering, University of Illinois @ Urbana-Champaign, Urbana, Illinois. Davis, W.J., H. Wang, and C. Hsieh (1991). Experimental Studies in Real-Time, Monte Carlo Simulation. To appear in IEEE Trans, on Systems, Man, and Cybernetics. Erikson, C , A. Vandenberge, and T. Miles (1987). Simulation, Animation, and Shop-floor Control. In: Proc. of 1987 Winter Simulation Conference, (A. Thesen, H. Grant, and W.D. Kelton, Eds.). IEEE, Piscataway, New Jersey, 649-653. Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization, Addison-Wesley, Reading, Massachusetts.

and Machine

Learning.

Goodwin, J.S. and J. K. Weeks (1986). Evaluating Scheduling Policies in a Multi-Level Assembly System. Intl. J. Production Research, 24( 2), 247-257.

266 Grant, F.H., S.Y. Nof and D. G. MacFarland (1988). Adaptive/Predictive Scheduling in Real-time. In: Advances in Manufacturing Systems Integration and Processes: 15th Conference on Production Research and Technology (D. A. Dornfeld, Ed.). Society of Manufacturing Engineers, Dearborn, Michigan, 277-280. Harmonosky, C M . (1990). Implementation Issues Using Simulation for Real-time Scheduling, Control, and Monitoring. In: Proc. of the 1990 Winter Simulation Conference (O. Balci, R. P. Sadowski, and R. E. Nance, Eds.). IEEE, Piscataway, New Jersey, 595598. Ho, Y.C. (1989). Dynamics of Discrete Event Systems. Proc. of the IEEE (Special Issue on Dynamics of Discrete Event Systems), 77(1), 3-6. Sadowski, R.P. (1985). Improving Automated Systems Scheduling. CIM Review, 10-13, 1985.

2(1),

Tirpak, T.M., S.M. Daniel, J.D. LaLonde and W.J. Davis (1991). A Note on a Fractal Architecture for Modeling and Controlling Flexible Manufacturing Systems. To appear in IEEE Systems, Man and Cybernetics. Wu, D.D. and R. A. Wysk (1989). An Application of Discrete-Event Simulation to On-Line Control and Scheduling in Flexible Manufacturing Systems. Intl. J. of Production Research, 27(9), 1603-1623.

C O M P U T A T I O N A L E X P E R I E N C E W I T H PARALLEL A L G O R I T H M S F O R SOLVING T H E Q U A D R A T I C A S S I G N M E N T P R O B L E M

PANOS M. P A R D A L O S 303 Weil Hall, D e p a r t m e n t of Industrial and Systems Engineering University of Florida, Gainesville, FL 32611 K O W T H A A. M U R T H Y and Y O N G LI C o m p u t e r Science D e p a r t m e n t , T h e Pennsylvania State University University P a r k , PA 16802

ABSTRACT T h e quadratic assignment problem belongs to a class of combinatorial optimization problems t h a t have m a n y practical applications in science and engineering b u t are extremely difficult t o solve. In this paper, we present two parallel algorithms for t h e quadratic assignment problem - an exact algorithm based on a branch-and-bound technique and a local search algorithm based on t h e Kernighan-Lin algorithm for t h e graph partitioning problem. These parallel algorithms have been implemented and tested on a shared m e m o r y vector multiprocessor system, t h e IBM ES/3090-600S V F computer. Computational results using m a n y classical test problems, as well as test problems generated with known optimal solutions of dimensions up to 100, are presented. These parallel algorithms, using t h e vector facility, achieved a balanced speed-up ratio of over 5.0 using 6 processors.

KEYWORDS quadratic assignment problem; parallel algorithm; vectorization; local search; branchand-bound. INTRODUCTION T h e quadratic assignment problem ( Q A P ) belongs to a class of combinatorial optimization problems t h a t have m a n y practical applications, but are computationally difficult t o solve. T h e Q A P , of which t h e traveling salesman problem is a special case, is N P hard. Furthermore, t h e problem of finding an e-approximate solution is also known to be NP-hard. Problems of sizes greater t h a n 15 are not practically solvable to optimality. Formally, t h e Q A P can be stated as follows: given a positive integer n, and two n x n matrices F = and D = (djw)> find a p e r m u t a t i o n p of t h e set { 1 , 2 , . . . , n } t h a t minimizes

c (p)

d =

12J2fiJ p(i)pU)-

In the framework of the facility location problem, n denotes the number of facilities (locations). The matrix F = ( / ) is the flow matrix, where represents the flow tJ 267

268 of materials from facility i to facility j for »,j = l , . . . , n . The matrix D = (du) is the distance matrix, where dki represents the distance from location k to location / for k,l = l , . . . , n . The objective function then represents the cost associated with the assignments of the n facilities to the n locations (Koopmans and Beckmann, 1957). In addition to its application in the facility location problem, the QAP has been found useful in scheduling, backboard wiring in electronics, and many other applications. A more comprehensive description of the applications of the QAP may be found in Burkard, 1990. The QAP may be formulated in many equivalent forms. Some of the most commonly used formulations include the quadratic 0-1 programming formulation n n

n n

«=i j=i jt=i /=i n

s.t.

53 x pg =

p=l

n

1, 0 = 1 , n ,

] T xpq = 1, p =

1 , n ,

g=l

x pc 9 { 0 , l } , p,q = l , . . . , n . The QAP can be formulated also as a global concave minimization problem (Pardalos and Rodgers, 1989, Pardalos and Rosen, 1987). This approach has been used to find suboptimal solutions using cutting plane methods (Bazarra and Sherali, 1982). Solution

Methods

Most exact solution algorithms are single assignment construction methods (i.e., they build up solutions by adding one assignment at a time) using some branch-and-bound techniques. The Gilmore-Lawler lower bound (GLB) (Gilmore, 1962, Lawler, 1963) and the Eigenvalue bound (EVB) (Finke, Burkard, and Rindl, 1987) are the most common lower bounds used in these branch-and-bound methods. Unfortunately, the usefulness of these bounds deteriorates as the problem size increases, especially for n > 15. The most successful implementation of an exact solution algorithm was developed in Burkard and Degris, 1980. More recently, Pardalos and Crouse, 1989 and Roucairol, 1987 have developed exact parallel algorithms that were run on IBM and CRAY supercomputers, respectively. More details and references on solving optimization problems using parallel branch-and-bound techniques can be found in Pardalos and Li, 1990. Several heuristic methods have been developed to find suboptimal solutions for largescale QAP. A cutting plane method based on the formulation of the QAP as a quadratic 0-1 programming problem was developed in Bazarra and Sherali, 1982. Another interesting heuristic, proposed in Wilhelm and Ward, 1987, was based on the simulated annealing technique. This approach was first proposed in Kirpatrick et a/., 1983 for problems in combinatorial optimization. The important aspect of this algorithm lies in the fact that it accepts new solutions of inferior quality with a certain probability in order to move out of a local minimum. Other heuristic methods (Murthy and Pardalos, 1990a, Skorin-Kapov, 1990) use a local search algorithm that depends on the formulation of the problem and the structure of its neighborhood. Such a local search algorithm starts with an initial feasible solution, makes it a current feasible solution, and runs in a repetitive fashion. In each repetition the algorithm replaces the current solution by a suitable feasible solution in the neighborhood of the current solution. The algorithm terminates after certain stopping criteria are satisfied, e.g., no further improvement of the objective function value.

269 In this paper, we present two parallel algorithms for t h e Q A P and results of computational testing. T h e branch-and-bound algorithm is based on t h e exact algorithm presented in Pardalos and Crouse, 1989. T h e local search algorithm is based on the Kernighan-Lin algorithm (Kernighan and Lin, 1972) for t h e graph partitioning problem. Both parallel algorithms are direct parallelization of t h e sequential algorithms. We implemented and tested these parallel algorithms on T h e Pennsylvania State University's IBM ES/3090-600S V F computer using two sets of test problems, namely, t h e classic N U G E N T set (Nugent, Vollmann, and R u m l , 1969) and a set of test problems with known optimal solutions generated by using an algorithm developed in Palubetskes, 1988, of sizes u p to n = 100. PARALLEL A L G O R I T H M S An Exact

Algorithm

T h e exact algorithm presented below uses a branch-and-bound technique. It starts with a good suboptimal solution obtained by using t h e heuristic described in Burkard and Derigs, 1980. This solution determines t h e initial best known upper bound ( B K U B ) . T h e n a tree, whose nodes store partial permutations, is constructed and maintained during t h e whole process. Initially, t h e tree consists of n x (n — 1) nodes storing the partial permutations p where p(l) = i and p(2) = 7, for i, j = 1 , 2 , . . . , n and i ^ j , i.e., those p e r m u t a t i o n s whose first two assignments are fixed. T h e tree is organized in t h e form of a heap keyed by the lower bounds (LWRBND) associated with t h e partial p e r m u t a t i o n s stored in t h e tree. T h e root of the tree has t h e m a x i m u m key value. T h e GLB is used here as t h e lower bound of a partial permutation. In t h e branching phase, t h e algorithm takes the root off t h e heap and splits it into two e solution, denoted p\ includes an additional assignnew partial solutions. One partial e ment and t h e other, denoted p , excludes t h a t assignment. T h e new partial solution pl is returned to t h e heap if it does not exclude all possible remaining assignments. If p is a complete solution, its objective function is evaluated and compared against the x current B K U B , which is u p d a t e d accordingly. In the bounding phase of the algorithm,x a new lower bound is computed for p and if this bound is higher t h a n t h e B K U B , p is discarded and a new node is obtained from t h e heap. This branching and bounding process is repeated until the heap is empty. Since t h e partial p e r m u t a t i o n at the root is generally closer to being a complete solution, it is thus a promising candidate for reducing t h e B K U B . Furthermore, t h e partial p e r m u t a t i o n at t h e root has t h e highest lower bound among those for t h e partial permutations stored in t h e tree. T h e partial permutation at the root can be discarded early in t h e process, hence keeping t h e height of t h e heap small. This is very important, since t h e solution space is extremely large even if the size of t h e problem is moderate. T h e exact algorithm is stated in Figure 1. T h e parallel version of the algorithm was coded in PARALLEL F O R T R A N to run on t h e IBM ES/3090-600S V F computer, which has 6 identical processors capable of processing independent tasks. In our experiment, we used all 6 processors. Given k processors, the parallel algorithm divides the initial tree of n x (n. — 1) nodes into k sub-trees of n x (n — l)/k nodes each. Hence, each processor has its own heap to process. T h e n , k parallel tasks are dispatched for finding optimal solutions within their respective sub-trees. This procedure not only balances t h e load among all processors,

270

Input: n, matrices F, D of size n

xn.

Output: Optimal permutations for the QAP.

1. Find a good suboptimal solution using a fast heuristic method. 2. Set initial value of the BKUB. 3. Create a heap of the n x n initial partial solutions. 4. WHILE the heap is NOT EMPTY, take the root off the heap. 5. Choose another assignment from those still available (not already excluded). 6. Createe two partial solutions: •

p which excludes the assignment and



p* which e includes the assignment.

7. Re-insert p unless it excludes all possible assignments.

% 8. IF p* is a complete assignment THEN Evaluate the objective function for the QAP corresponding to p . Update the BKUB accordingly and GOTO step 4. { ENDIF. 9. Calculate a new lower bound for p (LWRBND). i 10. IF (LWRBND > BKUB) THEN Discard p and GOTO step 4. ELSE GOTO step 6. ENDIF. 11. END WHILE. 12. Print the BKUB and all optimal solutions p. Figure 1: Exact Algorithm

271 but also keeps t h e processors busy to t h e fullest extent. T h e shared variable B K U B is u p d a t e d in a critical section (using t h e LOCK and U N L O C K facility). T h e matrices F, D are shared data. Since they are accessed only by reading, they are not locked in a critical section. T h e heaps of t h e processors are not shared among t h e m . Thus, in t h e above algorithm, steps 4 through 11 are run in parallel among t h e given number of processors. If t h e given problem has multiple optimal solutions, t h e algorithm finds all t h e optimal solutions. A Local Search

Algorithm

T h e local search algorithm we describe here is motivated by t h e Kernighan-Lin local search heuristic for t h e graph partitioning problem. Given an undirected graph G( V, E) with edge weights w(e),e € E, t h e graph partitioning problem ( G P ) is to divide V (where | V | = 2n) into two subsets A and B of equal sizes such t h a t t h e cost C ( A , B), which is t h e sum of t h e weights of all edges going from A to B, is t h e lowest. T h e Kernighan-Lin heuristic for t h e G P starts with a random partition of t h e vertex set of G (into two sets of equal sizes). For a current partition, t h e heuristic first constructs a sequence of partitions each of which differs from t h e previous one by exactly two vertices. All partitions should have costs lower t h a n t h a t of t h e current one (if there are no such partitions, t h e heuristic stops). T h e n t h e current partition is replaced by t h e partition in t h e sequence with t h e lowest cost. Here we have an analogy between t h e G P and t h e Q A P with respect to t h e Kernighan-Lin heuristic. In t h e G P , we start with a r a n d o m partition. In t h e Q A P , we start with a r a n d o m permutation. Instead of working with t h e partitions of t h e vertex set in the G P , we work with permutations in t h e Q A P . T h e sequence of partitions in t h e G P becomes now a sequence of permutations in t h e Q A P ; each of t h e permutations differs from t h e previous one by two assignments. And t h e permutations in t h e sequence have costs lower t h a n t h a t of t h e current permutation. T h e measure of cost in t h e G P is now replaced by t h e measure of cost in t h e Q A P . T h e next p e r m u t a t i o n is chosen to be t h e p e r m u t a t i o n with t h e lowest cost in t h e sequence of permutations corresponding to t h e current permutation (the heuristic stops if no such sequence of permutations can be found for t h e current p e r m u t a t i o n ) . Based on t h e above mentioned Kernighan-Lin heuristic, t h e local search algorithm for t h e Q A P was developed in Murthy and Pardalos, 1990a, which is stated in Figure 2. Note however, instead of using t h e cost C(pk) of a p e r m u t a t i o n pk in t h e sequence of p e r m u t a t i o n s corresponding to a current p e r m u t a t i o n po, we use t h e cumulative gain G(k) of the p e r m u t a t i o n p*, where G(k) = C(po) — C(pk). Hence t h e larger is the gain of a p e r m u t a t i o n , the lower is t h e cost. Each p e r m u t a t i o n pk in the sequence of p e r m u t a t i o n s of a current p e r m u t a t i o n p is obtained by exchanging a pair of locations 0 in t h e previous p e r m u t a t i o n pk-iIt has been proved t h a t t h e local search problem for t h e Q A P with t h e above defined neighborhood structure is PLS-Complete (Murthy and Pardalos, 1990b). T h e proposed local search algorithm was coded in PARALLEL F O R T R A N to run on t h e IBM ES/3090-600S V F computer. In t h e parallel version of t h e algorithm, t h e code for t h e evaluation of t h e gain was treated as a parallel task. To control t h e generation of parallel tasks, t h e number of tasks (virtual) was treated as a parameter to t h e program. This is necessary since, as the problem size increases, the n u m b e r of virtual tasks created would force t h e program to run out of memory. T h e virtual tasks were controlled in the program by t h e use of a ring buffer and a W A I T was issued after task scheduling if, and only if, all t h e tasks are running. T h e sequential version of t h e algorithm spent about 97% of t h e total C P U t i m e in t h e evaluation of t h e objective function (C(p,)) for every potential pair exchange. In the parallel version, t h e code pertaining to the

272 Input: n, matrices F, D of size n x ra, and a permutation p of size n. O u t p u t : A local optimal permutation p for the Q A P .

1. Set p0 = p and calculate its cost C(po). Set i = 0, 0. If such a pair does not exit then go to 7; otherwise, set G(l) = g\. 3. Set i = i + 1. For each pair of facilities not already selected, evaluate the step gain by exchanging their locations. Then, select the pair with m a x i m u m gain g{ = C(p;_i) — C(pi). If no such pair exists, set i = i — 1 and go to 5. 4. C o m p u t e the cumulative gain, G(i) = Ylk=i 9k- If G(i) > 0; then go to 3. 5. Select k, such t h a t G{k) is m a x i m u m for 0 < k < i. 6. If k > 0 then set p 0 = Pk and go to 2. 7. We have reached a local o p t i m u m for the Q A P . Set p = po and output p and C(p). Figure 2: Local Search Algorithm evaluation was modified to compute the contribution of the pair being considered for exchange for locations before and after the potential exchange and the difference was taken as t h e gain = C(p,-) — C ( p t_ i ) ) . This modification reduced the time spent in the corresponding subroutine to 85%. This timing was further reduced by generating in-line code for the computation of the gain, instead of calling a function, to about 76%. W h e n this section of code was implemented using the vector facility, this percentage was further progressively reduced to about 65% as the problem size increases to 100. Thus, this section of code was considered as the most important segment for parallelization. C O M P U T A T I O N A L RESULTS Computational

Environment

All computations were performed under the V M / C M S operating system on the IBM ES/3090-600S V F computer at the Center for Academic Computing, T h e Pennsylvania State University. This machine has a 15-nanosecond clock and generally supports between 600-800 interactive users, one MVS preferred guest, and several AIX/370 guests. It provides computational and communications support mainly for instructional and research purposes. Special users (those requiring the complete use of the system for benchmarking studies in a parallel processing environment) are generally provided with a relative share of 8000 during scheduled 5AM-7AM time slots. At Penn State, users generally receive relative shares ranging from 5 to 100. A user with a relative share of 100 has 20 times better access to the system resources t h a n a user with a relative share of 5. Thus, a

273 relative share of 8,000 provides superior access to t h e system resources, which generally enables t h e user t o produce results very close to those t h a t would be obtained with a dedicated system. Test Problems

and Test

Data

T h e proposed algorithm is tested with two classes of test problems. T h e first set of problems include t h e N U G E N T collection of t h e Q A P (Nugent, Vollmann, and Ruml, 1969). T h e other set of test problems are generated by t h e algorithm described in M u r t h y and Pardalos, 1990b and Palubetskes, 1988. Other sets of test problems for the Q A P , e.g., those in Skorin-Kapov, 1990 and Steinberg, 1961, are not used here. T h e N U G E N T set of test problems is one of t h e most widely used in t h e literature and can be used to test both heuristic and exact algorithms for t h e Q A P . For our study, test problems of sizes n = 5 , 6 , 7 , 8,12,15,20, and 30 are used. For problems of sizes 20 and 30, optimal solutions can not be obtained in reasonable a m o u n t of C P U t i m e (due to this difficulty, t h e exact algorithm was not run on these two cases). T h e test problems of t h e other class are generated according to t h e test problem generator, which o u t p u t s test problems with known optimal solutions, as reported in Murthy and Pardalos, 1990b and Palubetskes, 1988. T h e test problem generator contains two positive integer parameters, z and it;. A random variable with a uniform distribution in (0,1) is used also in t h e generator. For t h e exact algorithm, seven test problems are created with sizes n = 1 0 , 1 1 , 1 2 , 1 3 , 1 4 , 1 5 , and 16, with z = 9,w = 5. For t h e local search heuristic, test problems of sizes n = 1 0 , 2 0 , 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 8 0 , 9 0 , and 100 are generated. For each size n, we generate eight test problems by fixing z = 9 while taking w = 1,2, ...,8 respectively. T h e optimal objective function value for a test problem generated here is dependent on n and z and is independent of t h e value of w. T h u s , for t h e generated test problems with the same value for t h e parameter 2 , they have the same optimal objective function value for each fixed n. Computational

Results and

Analysis

T h e computational experiment was designed to test t h e efficiency of the parallel algorithms in t e r m s of their speed-up ratios. For both t h e parallel exact and heuristic algorithms, all 6 processors of the machine are used. For t h e parallel exact algorithm, each test problem in both sets of test problems was executed 5 times and t h e computational results reported here are the average over the 5 executions. W h e n there are multiple optimal permutations, all of t h e m are printed by t h e algorithm. T h e results are presented in tables 1 and 2. Note t h a t in the following tables, all C P U times are in seconds. T h e speed-up ratio of a parallel algorithm is computed by dividing the cumulative C P U t i m e by t h e wall time (elapsed time) of t h e parallel algorithm. As we can see from t h e tables below, the parallel exact algorithm has a good speed-up ratio for test problems of sizes larger than 12. For smaller size problems, the parallel algorithm is not as efficient.

274 Table 1: Exact Algorithm - N U G E N T Test Problems

n

Optimal Value

Alg. Best Value

Cumulative CPU Time

Wall Time

Speed up Ratio

5 6 7 8 12 15

50 86 148 214 578 1150

50 86 148 214 578 1150

0.11 0.12 0.15 0.31 27.58 1587.30

0.38 0.37 0.40 0.51 8.71 430.91

0.29 0.32 0.38 0.61 3.17 3.68

Table 2: Exact Algorithm - Test Problems with Known O p t i m a l Solutions

n

Optimal Value

Alg. Best Value

Cumulative CPU Time

Wall Time

Speed up Ratio

10 11 12 13 14 15 16

1890 3960 2772 6552 4914 5040 5760

1890 3960 2772 6552 4914 5040 5760

0.61 47.29 247.27 982.33 324.47 12872.65 12543.98

0.65 11.60 51.20 263.52 89.31 2580.74 2340.38

0.94 4.08 4.83 3.73 3.63 4.99 5.36

275 Table 3: Local Search Algorithm - N U G E N T Test Problems

n

Optimal Value

Alg. Best Value

Cumulative CPU Time

Wall Time

Speed up Ratio

5 6 7 8 12 15 20 30

50 86 148 214 578 1150 2570 6124

50 86 148 214 592 1150 2602 6180

0.01 0.01 0.01 0.01 0.01 0.02 0.02 0.08

0.02 0.02 0.02 0.02 0.02 0.03 0.03 0.06

0.50 0.50 0.50 0.50 0.50 0.67 0.67 1.33

Table 4: Local Search Algorithm - Test Problems with Known Optimal Solutions

W i t h o u t Vector Facility

W i t h Vector Facility

n

CPU Time

Wall Time

Speed-up Ratio

CPU Time

Wall Time

Speed-up Ratio

10 20 30 40 50 60 70 80 90 100

0.02 0.23 0.84 2.65 6.22 12.08 22.70 46.15 87.88 126.41

0.03 0.13 0.46 1.15 2.64 4.71 8.29 14.97 27.08 36.81

0.67 1.64 1.83 2.30 2.36 2.56 2.74 3.12 3.25 3.43

0.03 0.27 0.90 2.48 5.43 9.11 16.70 35.32 56.52 84.23

0.05 0.16 0.47 1.13 2.26 3.36 5.69 11.54 17.32 25.14

0.60 1.69 1.91 2.19 2.40 2.71 2.93 3.06 3.26 3.40

276 For t h e local search algorithm, t h e computed results are taken t o be t h e average over 25 executions of t h e algorithm with randomly generated starting permutations for both sets of test problems. T h e results are presented in table 3 and table 4. In table 4, we omitted t h e columns for optimal objective function value and t h e best value found by t h e algorithm since we are focusing on t h e speed-up ratio of the algorithm under two modes, one using t h e vector facility, one without using t h e vector facility. From t h e above tables, it is evident t h a t the local search algorithm runs very fast. Again, for small n (where n < 20) parallel processing is not as efficient. It should be noted t h a t , in t e r m s of t h e accuracy of the heuristic solutions, t h e local search algorithm performs remarkably well. In our experiment, t h e algorithm found t h e optimal solution for problems of sizes up to 40 (in t h e second set of test problems) for some of the 25 runs of t h e problems. T h e averages of the objective function values of t h e heuristic solutions found by t h e algorithm are about 5% away from t h e optimal objective function values. For t h e second set of test problems, the vector facility is used to evaluate its effectiveness. As shown in table 4, the use of t h e vector facility significantly reduces t h e solution t i m e when n > 50. For n = 100, t h e C P U timing with t h e vector facility has shown a significant improvement of more t h a n 33%. T h e speed-up ratio has also improved with t h e dimension of the problem. REFERENCES Bazarra, M.S. and H.D. Sherali (1982). On the use of exact and heuristic cutting plane methods for t h e quadratic assignment problem. Journal of Operations Research Society, 33, 991-1003. R.E. Burkard (1990). Discrete Location Theory (Chapter R.L. Francis Ed., John Wiley Sz Sons, Inc., Berlin.

9). P.B. Mirchandani and

R.E. Burkard and U. Derigs (1980). Assignment and matching problems: Solution methods with Fortran programs. Lecture Notes in Economics and Mathematical Systems, 184- Springer, Berlin. G. Finke, R.E. Burkard, and F . Rendl (1987). Quadratic assignment problems. of Discrete Mathematics, 31, 61-82.

Annals

P.C. Gilmore (1962). Optimal and suboptimal algorithms for the quadratic assignment problem. J. SI AM, 10, 305-313. B. Kernighan and S. Lin (1972). An efficient heuristic procedure for partitioning graphs. Bell Systems Journal, ^0, 291-307. S. Kirpatrick, C D . Gelatti, and M.P. Vecchi (1983). Optimization by simulated annealing. Science, 220, 671-680. T . C Koopmans and M.J. Beckmann (1957). Assignment problems and the location of economic activities. Econometrica, 25, 53-76. E.L. Lawler (1963). 586-599.

T h e quadratic assignment problem.

Management

Science,

9,

K.A. M u r t h y and P.M. Pardalos (1990a). A local search algorithm for t h e quadratic assignment problem. Technical Report CS-90-44, T h e Pennsylvania State University.

277 K.A. Murthy and P.M. Pardalos (1990b). A polynomial-time approximation algorithm for the quadratic assignment problem. Technical Report CS-33-90, The Pennsylvania State University. C.E. Nugent, T.E. Vollmann, and J. Ruml (1969). An experimental comparison of techniques for the assignment of facilities to locations. Journal of Operations Research, 16, 150-173. G.S. Palubetskes (1988). Generation of quadratic assignment test problems with known optimal solutions (in Russian). Zh. Vychisl. Mat. Mat. Fiz., 28(11), 1740-1743. P.M. Pardalos and J. Crouse (1989). A parallel algorithm for the quadratic assignment problem. In: Proceedings of the Supercomputing 1989 Conference, 351-360, ACM Press. P.M. Pardalos and X. Li (1990). Parallel branch and bound algorithms for combinatorial optimization. Supercomputer, 39, 23-30. P.M. Pardalos and G.P. Rodgers (1989). Parallel branch and bound algorithms for unconstrained quadratic 0-1 programming. In: Impact of Recent Advances on Operations Research, R. Sharda et al. Ed., 131-143. North-Holland Press. P.M. Pardalos and J.B. Rosen (1987). Constrained Global Optimization: Algorithms and Applications. Lecture Notes in Computer Science, 268. Springer-Verlag. C. Roucairol (1987). A parallel branch and bound algorithm for the quadratic assignment problem. Discrete Applied Mathematics, 18, 211-225. J. Skorin-Kapov (1990). Tabu search applied to the quadratic assignment problem. ORSA Journal on Computing, 2(1), 33-45. L. Steinberg (1961). The backboard wiring problem: A placement algorithm. SI AM Review, 3, 37-50. M.R. Wilhelm and T.L. Ward (1987). Solving quadratic assignment problems by simulated annealing. IEEE Transactions, 19(1), 107-119.

This page intentionally left blank

ON REPORTING THE SPEEDUP OF PARAT J EI, ALGORITHMS: A SURVEY OF ISSUES AND EXPERTS RICHARD S. BARR and BETTY L. HICKMAN

Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas 75275 (Ban) Mathematics and Computer Science Department, University of Nebraska at Omaha, Omaha, Nebraska 68182 (Hickman)

ABSTRACT Perhaps the most commonly used performance measure for implementations of parallel algorithms is speedup, a ratio of the serial to parallel solution time for a given problem. We probe the difficulties in using this statistic for computational reporting, including: definitional ambiguities, testing biases, machine influence, and the effects of tuning parameters. Some of these difficulties were explored further in a survey sent to leading computational mathematical programming researchers for their reactions and suggestions. A summary of the survey and proposals for conscientious reporting are presented. KEYWORDS Parallel processing; speedup; algorithm testing; performance evaluation; performance metrics; efficiency measures; mathematical programming. INTRODUCTION Algorithm development is at the heart of mathematical programming research, wherein more efficient algorithms are prized. As an indicator of algorithmic performance, efficiency reflects the level of resources (central processor time, iterations, bytes of primary storage) required to obtain a solution of given quality (percent optimality, accuracy) (Greenberg, 1990; Jackson et al, 1990). A variety of measures and summary statistics have been devised to reflect efficiency and compare algorithms. The efficiency of an algorithm relative to others has traditionally been determined by (1) theoretical, order analysis and (2) empirical testing of algorithmic implementations, or codes. While both approaches have merit, computational testing is increasingly an imperative for publication. This is due, in part, to the occasional failure of order analysis to predict accurately the behavior of an algorithm's implementation on problems of practical interest. For example, while the simplex method has daunting worst-case behavior, its efficiency as an optimizer for a wide variety of industrial applications is well-documented. This paper addresses complications that arise when reporting on implementations of parallel algorithms. Several common metrics of the efficiency of parallel implementations result in 279

280 measurement and comparison difficulties stemming from the stochastic nature of some parallel algorithms and inherent opportunities for the introduction of biases. Of particular concern is speedup, a widely used statistic for describing the efficiency achieved over serial processing by the use of multiple processors. In addition to documenting problems of traditional reporting of parallel experimentation, we also present the results of a survey of prominent mathematical programming researchers regarding their views on the topic, and close with an analysis of the issues and expert opinion.

REPORTING OF COMPUTATIONAL TESTING Much work has been accomplished in constructing a set of guidelines for reporting on computational experimentation, particularly in the area of mathematical programming. Following a series of early articles (Gilsin et aL, 1977; Jackson and Mulvey, 1978), the classic work by Crowder, Dembo, and Mulvey (1980) provides reporting guidelines for computational experiments that have been adopted by a number of scholarly journals. Unfortunately these recommendations were written before parallel processing systems became generally available and, therefore, did not address multi-processing issues. A recent follow-up report by Jackson, Boggs, Nash, and Powell (1980) extended the topics covered in Crowder et al Relevant to this paper are its sections on choosing performance measures and reporting of computational tests on machines with advanced architectures. In its discussion of performance measures for evaluating mathematical programming software, Jackson et al. state that an efficiency measure should reflect computational effort and solution quality. "(A)uthors [should] state clearly what is being tested, what performance criteria are being considered, and what performance measure is being used to draw inferences about these criteria. ... (R)eferees should bear in mind that performance measures are summary statistics and, as much as possible, should conform to all of the accepted rules regarding the use thereof." In the sections that follow, we explore speedup as a performance measure and attempt to determine the accepted, or at least some acceptable, rules for its use.

Reporting Computational

Experiments on Parallel

Computers

Parallel processing is the simultaneous manipulation of data by multiple computing elements working to complete a common body of work. The key objective of parallel processing is the reduction of real ("wall clock") time or cost required to complete the work. Appropriate applications involve problems whose solution decreases in value over time, as with weather forecasting and speech/handwriting recognition, or when a delay in obtaining a solution results in escalating costs as with telecommunication systems message routing, anti-missile missile targeting, and aircraft rescheduling after system shutdowns. Hence the motivation for such new machine architectures springs from the need to solve existing problems faster or to make tractable larger and more difficult problems. Of interest therefore are measures of the improvement over traditional serial computing achieved by the use of multiple processors. The measures are influenced by the type of machine studied, since the architectures of parallel systems are quite varied. We will address metrics for the most prevalent class of commercial parallel design, multiple-instruction-multiple-data (MIMD) (Flynn, 1966) which uses multiple processors to operate simultaneously and independently on different data streams and exchange results via an interconnected network. (Massively parallel instances of the single-instruction-multiple-data architecture, which requires all processors to perform identical instructions in lockstep, pose special reporting problems since individual computing units are often so rudimentary that serial processing is impractical.)

281

What Is

"Time"?

Since the rationale for parallelism is time-based, it is reasonable that a performance or efficiency measure be temporal as well. With single-processor systems, a common performance measure is the CPU time to solve a problem; this is the time the processor spends executing instructions in the algorithmic portion of the program being tested, and typically excludes the time for input of problem data, output of results, and system overhead activities such as virtual memory paging and job swapping. CPU time for a job is maintained by the operating system software, since many jobs may be sharing the same processor; the programmer uses this to compute that portion pertaining to the algorithm under study. In the parallel case, time is not a sum of CPU times on each processor nor the largest across all. Since the objective of parallelism is real-time reduction, time must include any processor waiting resulting from an unbalanced workload. Hence the most prudent choice for measuring a parallel code's performance is the real time to solve a particular problem. Since this conservative measure includes all system paging, job swapping overhead, it is preferable that timings be made on a dedicated or lightly-loaded machine. What Is

"Speedup"?

The most common measure of the performance of an MIMD parallel implementation is speedup. Based on a definition of time, speedup is the ratio of serial and parallel times to solve a particular problem on a given machine. However, using different assumptions, researchers have employed several definitions for speedup in their reporting. Definition 1: Speedup. The speedup, Sip), achieved by a parallel algorithm running on p processors is defined as: „

(

^

Time to solve a problem with the fastest serial code on a specific parallel computer ~~ Time to solve the same problem with the parallel code using p processors on the same computer

For example, assume the fastest serial time is 100 seconds for a specific problem on a parallel machine, and a parallel algorithm solves the same problem in 20 seconds on the same machine using six processors. The speedup from this experiment would be S(6) = 100/20 = 5.0. Linear speedup, with Sip)=p, is considered an ideal application of parallelism, although superlinear results, with S(p)>p> are possible in some instances (Barr and Hickman, 1989). Definition 2 .'Relative Speedup. Some researchers use relative speedup in their reporting, defined as: s _

Time to solve a problem with the parallel code on one processor Time to solve the same problem with the parallel code on p processors

This should be used in cases where the uniprocessor version of the parallel code dominates all other serial implementations. Unfortunately, some papers interpret relative speedup as speedup when the serial case is not dominant, leading to erroneous claims of efficiency (Peters, 1990). Definition 3: Absolute Speedup. The use of absolute speedup has also been proposed (Quinn, 1987) to compare algorithms: Fastest serial time on any serial computer ~ Time to execute the parallel code on p processors of a parallel

computer

282 The rationale for AS(p) reflects the primary objective of parallel processing: real-time reduction. While this definition goes to the heart of the matter, it restricts research to those individuals with access to the fastest serial machine and cannot be determined until all relevant serial algorithms have been tested on all high-end systems, since "fastest" may be dependent on a particular combination of algorithm and machine for each problem tested. Other Speedup Definitions. Incremental speedup has been used in reporting, defined as: TO,

x _ (P - 1 ) • (Time for the parallel code onp-1 processors) (p) • (Time for the parallel code on p processors)

where p > 1. This value shows the fraction of time improvement from adding another processor, and will be 1.0 for linear speedup. This variant of relative speedup has been used where one-processor times are unavailable (Peters, 1990). Also see Gustafson et aL for a discussion of scaled speedup. Efficiency. Another measure of the performance of a parallel implementation is efficiency, the fraction of linear speedup attained: E(p) = ^

(5)

where S = S(p), RS(p), or AS(p). This is speedup normalized by number of processors, and F(p)=1.0 for linear speedup. Note that, since its value is a function of the definitions used for speedup and time, this normalized speedup is susceptible to all of the same reporting concerns and difficulties as the other performance metrics detailed in this report. ISSUES IN REPORTING THE RESULTS OF PARALLEL TESTING As with serial software, but even more so in parallel testing, performance measures abound, and an experimenter must choose a small subset to summarize succinctly the experimentation to draw conclusions about the underlying algorithm. The number of reported measures are limited by journalistic restrictions, and the potentially large number of values for p. Hence researchers must (1) select appropriate measures, and (2) use the measures in an objective way to reflect accurately the behavior of the algorithm. Although easily stated, these are surprisingly complicated tasks, especially when studying the behavior of parallel codes. Choosing a Measure for Parallel

Implementations

Jackson et al. recommend that "when comparing a parallel algorithm with a scalar method, it is preferable to compare the parallel method not only with its scalar specialization, but also with the best scalar methods. In addition to reporting absolute speed-ups, times normalized by the number of processors are desirable." The authors clearly prefer Sip) over &S(p), and encourage the use of E(p). But "gray areas" in reporting may result from ambiguity in the definitions of time and speedup. What portion of the system overhead time (e.g., for process creation and termination) should be included in the serial and parallel results? If real time is used for the parallel algorithm, should this also be used for the serial, even though we have more accurate serial data available regarding the processor time spent executing algorithmic steps? A more slippery question turns out to be: what constitutes the base, serial case? Since most implementations have "tuning" parameters, such as multipricing options, reinversion frequency, and tolerances—each of which influences a problem's solution time—how should these be set?

283 It is widely known that, in many cases, experimentation to determine good values for the parameters can result in a significant reduction in execution times. If you wish to use the best possible serial time, how much testing with different strategies (parameter sets) should be performed? Further, does each code/strategy combination create a new algorithm to be considered separately? Determining the parallel time involves the same question, further complicated by the fact that the value of individual strategies varies with p, the number of processors. A strategy that works well for a given p does not necessarily work well for a different number of processors. Should testing of numerous strategies be performed for each instance of p to be reported, and how extensive should such testing be? Or should a fixed strategy be used for all values of pi Should prescribed strategy formulas that vary with p be used instead? Complicating matters further is the stochastic nature of some parallel algorithms. Many iterative procedures not only have multiple paths to a problem's solution, but the path chosen may be non-deterministic, due to timing-dependent decisions (race conditions) in the algorithm design. For example, multiple executions of our parallel network simplex codes (Barr and Hickman, 1989,1990; Hickman, 1991) typically yield different solution times and numbers of pivots when applied to the same problem with the same strategy, under virtually identical operating conditions. In some cases, differences of thousands of pivots and 15% time variations were observed. This is due not only to alternate optima, but to slight differences in timings of events, resulting in different incoming variables, tie-breaking choices, and, therefore, sequence of extreme points traversed. So how is this to be tested and reported? Should multiple runs be performed for each code/problem/strategy/number-of-processors combination? Should all resultant timings be reported or summarized in statistics? If we wish to determine speedup, do we use the best, worst, or average times (averages would require a new definition of speedup). Some researchers always use the best times, arguing that these show the actual capability of the code; is this reasonable? Such questions are the topic of this report. Potential Sources of Bias in Reporting Parallel

Results

As is evident from the previous discussion, many choices must be made in in the design and reporting of experiments with parallel codes. This being a relatively new area of research, there are few generally accepted answers to the questions posed. Analysis of data variation is central to statistically designed experimentation (Amini and Barr, 1990; Aminier al. 1990) but in dealing t with speedup there is variation in the components needed to compute this statistic, which is a different issue. Also, since speedup is a ratio of serial to parallel time, we have the following observation: Observation: The longer the serial time, the greater the parallel speedup, and vice versa. Evidence: From inspection of the S(p), RS(p), and AS(p) definitions. I So while fast single-processor times highlight the strength of the serial code, they can produce unimpressive parallel speedups. Conversely, a slow serial time can yield seemingly spectacular parallel results. Hence it is a simple matter to influence (inadvertently or deliberately) the outcome of an experiment employing speedup as a performance measure through the choice of serial and parallel strategies. A (dis)advantageous set of strategies can, therefore, greatly skew the research findings. This has motivational implications for the level of effort expended in exploring alternate strategies. Nominal serial testing may be rewarded with strong parallel results, while a more thorough search for the best one-processor strategy could only downgrade the parallel findings.

284 How Should We Approach These Issues? The sections above illustrate some of the difficulties that arose in our attempts to summarize objectively experimentation with our parallel codes. While we made what we considered reasonable decisions for our papers, we also sought the insight of others in the research community in hopes of finding clear-cut answers or, at least, a consensus on some of the issues. A SURVEY OF EXPERTS To get a "sense of the community," we invited a group of computationally oriented mathematical programming researchers to participate in a survey. The design objectives for the survey were: (1) to address definitional issues regarding the speedup of parallel algorithm implementations, (2) to help identify a consensus regarding the usefulness of speedup as a measure for reporting parallel performance, and (3) to elicit a high response rate. To meet these objectives, we constructed a series of simple examples, accompanied by six short-answer questions, as detailed below. Each multiple choice question included a userdefinable response, comments were welcomed on each question and the survey as a whole, and respondents could remain anonymous. Commonly used speedup definitions were included for terminological consistency. The survey was sent to 41 researchers, and 23 completed forms were returned (see acknowledgements), a strong 56% response rate. The authors' selections are not included in the totals, but in the discussion below. From the respondents' comments, most seemed to enjoy analyzing the questions posed, and one used the survey as the subject of a graduate seminar. The following sections explore each question, and summarize and comment on the participants' responses. Also included are selected, unattributed comments from consenting respondents. Question 1: Effects of Tuning

Parameters

The first survey question is shown in Fig. 1. The issue involved is: how should times from different strategies be used in computing speedup? Code B represents a competing algorithm that is, for the most part, dominated by Code A. Does a change in strategy form a different algorithm? Should the same strategy be used for both serial and parallel times, should only the best times across all tested strategies be used, should we average the speedups, or something else? Survey Results. The distribution of answers, depicted in Fig. 2, was: (a) 0%, (b) 39%, (c) 0%, (d) 4%, (e) 13%, and "Other" 43%. Of the "Other" responses, 58% wanted to include the entire table of times, 34% computed speedup for each code/strategy combination and included all values or the range, and 6% proposed a different calculation. Selected comments: (1) "The single number cannot capture all of the relevant information." (2) "The strategies perform differently enough to suggest three codes: AW, AX, and AY." (3) "There really isn't an appropriate summary of this data." (4) "[Use (e) and] report the standard deviation also." (5) "Reporting raw data as well as speedup is important." Commentary. The leading selection was "Other," with a majority of those respondents wanting to include all data in reports. S(2) was the dominant summary measure [choice (b)], comparing the best individual parallel and serial times across all codes and strategies, and the remaining responses varied widely. While we sympathize with the desire for all of the raw data, the sheer volume of such data generated by conscientious experimentation becomes overwhelming. For example, a small test bed might consist of 50 representative test problems, to be examined on 1 to 20 processors, with,

285 Question 1. Two optimization codes, A and B, are used to solve the same problem on the same parallel machine and identify the same optimal solution. Each code has a "tuning" parameter which is determined for each run using a "strategy" that fixes the parameter based on problem size and/or number of processors. Runs are made using four different strategies, giving the following results. Code Strategy Serial Solution Time Parallel Solution Time (with two processors)

A W 100 60

A X 90 70

A Y 200 50

B Z 150 100

The authors of code A wish to report speedups. What value for two-processor speedup should be reported? (Please mark your choice.)



(a) M= L7 6

Strategy W (Both cases "good")



(b) | = 1 .

Use the best individual times.



(c)



» f

8

Strategy X (Fixed strategy with best serial)

| | = 1.29

Strategy Y (Best two-processor time)

= 4.0

100 60



(e)



(f) Other:

+ 90 70

200 50

= 2.32

Average speedup across all A strategies tested. Rationale:

Fig. 1. Question 1, The Effect of Tuning Parameters

Other Fig. 2. Question 1 Responses say, 20 reasonable strategies in each case. This results in 20,000 combinations to test for a single code, ignoring the fact that multiple instances of each combination may be required due to variations in timings (see question 3) or demands of a rigorous experimental design (Amini et al, 1990). Even with only four values forp, and an exploration of ten strategies, 2,000 problems must be run. And if the problems are substantial enough to demand the application of parallel processing, the total processing time (especially the search for the best serial case) makes the testing, much less the reporting, impractical.

286 From our viewpoint, an algorithm definition includes the strategy; the two notions should not be separated in reported results. Hence we concur with comment (2), and believe that the times for a given code and strategy should be kept together. The strategy may be dynamic with the number of processors or problem characteristics, but must be rule-based and documented in the experimentation reports. The difficulty then becomes: how to identify good code-strategy combinations? Conscientious researchers will work diligentiy to find a combination that has both strong serial and robust parallel performance across a wide range of problems. For all of these reasons, we would have picked responses (a) or (c) or, in the absence of other test problems, would have devised a hybrid strategy from X and Y that would have yielded the experts* leading choice, (b). Question 2: Definition of Speedup Figure 3 depicts the survey's second question which concerns the proper definition and calculation of speedup. Serial code B, which only has a single strategy Z, has the fastest serial time on the problem from question 1. Code A's parallel times vary with strategy. What is the speedup of code A? Survey Results. The response percentages were: (a) 8%, (b) 0%, (c) 8%, (d) 38%, (e) 4%, and "Other" 42% (see Fig. 4). Most of the "Other" responses had the same answer as on question 1, and for the same reasons. Selected comments: (1) "Present the table. The single number, speedup, cannot capture all of the relevant information. In this case it would be better to report the details. In the text of the paper, one could mention a range 90-200 of serial times and 50-70 of parallel times." (2) "More problems should be tested here." Commentary. This question highlights the main objective of parallel processing, namely, reduction of the real time required to solve a given problem. Here, a serial code exists which runs faster than the one-processor parallel code, hence better speedups could be reported by ignoring the existence of B. However, that would certainly be misleading, since code B should be used for the serial time. The majority of respondents selecting a speedup metric chose one based on serial code B [(d) or (e)], with which we concur. We also feel that, if at all possible, results from a well-known code, such as MINOS (Murtagh and Saunders, 1987) or NETFLO (Kennington and Helgason, 1980), should be included in reports. Such reference codes give the reader an awareness of the overall efficiency of all software being tested, and is of particular value on new, relatively unfamiliar technology. Question 3: Effect of Timing Variations on Speedup The third question, shown in Fig. 5, focuses on a single problem/code/strategy combination, illustrating the stochastic nature of both parallel and serial testing. Repeated executions of this combination under identical system conditions show variability in both the serial and parallel timings. Variation from the timing mechanism affects all values, and the larger parallel variability is due to time dependencies in the algorithm. How is speedup to be computed in this more realistic setting? (The italicized annotations in Fig. 5 were not on the survey itself.) Survey Results. The response percentages were: (a) 57%, (b)-(d) 0%, (e) 9%, (f) 0%, and "Other" 35% (see Fig. 6). Of the "Other" responses, 63% wanted some measures that reflected variability, 25% suggested the ratio of means or medians, and 12% wished to report the raw data. Selected comments: (1) "The greater variance in the two-processor times is an important part of the data." (2) "Also, the standard deviation should be used [along with (e)]." (3) "For serial time,

287 Question 2. Here we have the same scenario, but different results. In this instance, B is a serial code.

Code Strategy Serial Solution Time Parallel Solution Time (with two processors)

A W 100 60

A Y 90 70

B Z 80 N.A.

The authors of code A wish to report speedups. What value for two-processor speedup should be reported? (Please mark your choice.) 1100 nr\ (a)

• • • • • •

(b) 9 0 = 1 . 2 9 70 (c) ^ = 1 5 60 (d) ^ = 1 3 3 60 (e) (f)

70 Rationale: Fig. 3. Question 2, Basic Definition

d

Other Fig. 4. Question 2 Responses use the average - errors are due to timing anyway." (4) "Should provide variance measures since that is the purpose of the study." Comments. This question yielded the clearest consensus thus far, with 57% of the respondents choosing (a), the ratio of mean times, and many indicated a desire for a supplementary indicator of variability. While we hate to differ with our esteemed colleagues, we feel that (e) is a similar, but slightly more correct, choice for the following reasons. The serial variation is due to random measurement error, hence should be averaged to form the base case, per responses (a), (c), and (e). With this base, speedup can be computed for each two-processor time and averaged to give

288

Question 3. The times for code A using strategy Z dominate all other code and strategy combinations. However, multiple executions of the same algorithm in the same circumstances gave different times. Differences in serial times are due to variability within the computer's timing mechanism, and the (larger) differences in parallel times are due to timing-dependent choices (race conditions) in the algorithm. The following results come from 14 runs of code A with strategy Z.

Solution times (7 observations each)

Serial Runs 98,99,100,100,

2-Processor Runs

100,101,102

60,70,70

100

60 50,70

Mean time Min, Max time

50,50,60,60,

98,102

What value for two-processor speedup should be reported for code A? (Please mark your choice.)

D

=



Ratio of means

~60~



(b) | - 1 . « J

Best individual serial over mean



(c

Mean serial over best parallel



(d)



(e) Mean of

£ -M |=1.96

f

Best serial over best parallel 100 100 100 100 100 100 100 ) 50 ' 50 ' 60 ' 60 ' 60 ' 70 ' 70 Mean ofspeedups



(f)

^98 98 98 98 98 98 Mean of

so'sowexno^o

(g) Other:

. with mean serial base case

V

Mean ofspeedups •

parallel

with best serial base case

Rationale:

Fig. 5. Question 3, Effect of Timing Variation

Other Fig. 6. Question 3 Responses

289 Question 4. How much effort should be spent identifying the "best" serial time for a given problem? (For example, with a simplex-based algorithm, how much testing should be performed to identify the best pivot strategy?) Please express your answer on a scale from 1 to 10 where 1 = minimal testing and 10 = as exhaustive as possible. Your answer: Fig. 7. Question 4, Effort for Serial Case 30,

1 2 3 4 5 6 7 8 9

10

Response Fig. 8. Distribution of Responses To Question 4 the mean speedup ratio, instead of the ratio of mean times which does not follow any of the standard speedup definitions. This would also permit reporting speedup variability measures such as the standard deviation and range. The situation underscores the difficulty in reporting all of the raw data, and leads to the question of how many instances of each problem/ccde/strategy/processors combination should be run? The testing of five instances of one code on 50 problems, with 10 strategies, and four processor settings, involves 10,000runs; practicalities will likely force compromises on such an experimental design. Question 4: Degree of Effort on Serial Case Because of the crucial role of the serial time in computing speedup, the level of effort expended to identify a "best" value is a significant factor in reporting. Respondents expressed the importance of such experimentation on a scale of 1 to 10 (see Fig. 7). Survey Results. Of the 83% that gave a numerical answer, the "effort" statistics are: mean, 5.8; median and mode, 5; standard deviation, 2.9; and distribution as shown in Fig. 8. The 17% nonrespondents said that the answer depended on the purpose of the experimentation. Selected comments: (1) "[Expend] just as much effort as would be done for the parallel code." (2) "Use standard setting of parameters." (3) "Be reasonable. Give arguments as to why you made the choice. Realize that performance is highly problem-dependent and there may not be a 'best' serial version." (4) "In many cases it may be preferable to compare with a 'standard' algorithm (e.g., MINOS for simplex)." (5) "Difficult to answer, depends on reason for study." Comments. With a full range of values, and slightly left-skewed distribution, the responses indicate that a reasonable amount of testing should be performed. We feel that our testing has been in the exhausting, but not exhaustive, 8 to 9 range. Comment (1) seemed appropriate, but

290 Question 5. The solution strategy used in a code can strongly affect execution times. When reporting results of testing a code on a given problem with different numbers of processors, the reported results should be based on a strategy which is: D

(a) Fixed, invariant of the number of processors



(b) Rule-based, and can vary with the number of processors and problem parameters



(c) The fastest of all tested for each number of processors



(d) Other: Fig. 9. Question 5, Setting Solution Strategy

Fig. 10. Question 5 Responses we are reminded that the best code/strategy combination tends to vary from problem to problem and with number of processors used (i.e., the fastest for two processors is not necessarily fastest with six). Question 5: Setting

Strategies

The question given in Fig. 9 addresses the means by which strategies should be determined for reporting purposes. Implicit is the issue of whether researchers should be able to determine a unique strategy for each problem/processor combination and report the results of the empirical best. Survey Results. The response percentages were: (a) 4%, (b) 61%, (c) 0%, "Other" 30%, and 4% did not answer (Fig. 10). Of the "Other" respondents, 43% said "any," 29% "(a) and (b)," 14% "all", and 14% said that it depends on the research objective. Selected comments: (1) "Report (a) and (b). The more the merrier." (2) "[Rule-based,] but part of the program." (3) "More important is to explicitly state which of the above was used." Comments. Respondents are closer to a consensus on this issue, with a strong majority preferring a rule-based strategy, rather than individually tuned ones, and we heartily agree. Note that results can be easily biased if choice (a) is employed—simply choose a good multi-processor strategy. In our experiments with network simplex, a strategy that performed well in parallel typically worked poorly serially; hence using such a fixed strategy could result in dramatic speedups. Choice (b) seems the most fair, but leaves open the question of how to devise the rule. Question 6: Validity of Speedup as a Metric The last question, shown in Fig. 11, allows participants to summarize their feelings about speedup's current role as the leading performance measure for reporting parallel testing.

291 Question 6. Should speedup be used as the primary measure of performance of a parallel algorithm? • D

(a) Yes (b) No, we should use: Fig. 11. Question 6, Efficacy of Speedup as Metric Yes

No Fig. 12. Question 6 Responses Survey Results. The response percentages were: 57% yes, and 43% no (Fig. 12). Suggested alternatives or additions included measures of cost-effectiveness, efficiency, robustness, quality of solution, and chance of catastrophic error.

,, Selected comments: (1) "I tend to be skeptical about one number measuring the goodness of an algorithm. (2) "No, but it is attractive to boil performance down to a single number, so it will likely continue as the dominant measure." (3) "[Use] a number of performance measures, just as we learned when dealing with serial algorithms." (4) "There must be a better method. But I do not know it." Comments. A slight majority begrudgingly accepted speedup as the primary parallel performance measure but, as revealed in the comments, would prefer a better one. Several indicated the need for variation and cost-effectiveness to be represented in reportings.

Overall

Comments

The following quotations were selected from the general comments of consenting respondents regarding the questionnaire or the reporting of parallel experimentation. "Rating the effectiveness of a parallel algorithm by a single measure, speedup, seems analagous to describing a probability distribution by reporting only its mean." "As a rule, authors of a code will present the data so as to make their code appear best. That is human nature. More important is to explicitly state how they are reporting and how testing was performed, i.e. acknowledge their biases." "The value of parallelism is an economic issue or real time speedup issue. ...Without the cost of the parallel system the benefits of speedup are meaningless." "Also we should report the actual times—not only speedups—and on problem sizes where speedup is of the essence." "After the initial excitement of actual implementation of conventional or new algorithms on parallel machines, speedup factors are going to lose their allure. ... However, if we show that

292 what took an hour on a $10 million superframe now takes 15 minutes on a $500K multicomputer, it will have a significant impact whatever the speedup factor is." "Rules to follow: 1) Avoid point samples, i.e. solve each problem instance several times and solve many problem instances. 2) Summarize data in more than one way. Be willing to report negative results." "More people should think of these important details." CONCLUSIONS The survey of leading researchers in computational mathematical programming in most cases did not yield a clear consensus. The large number of "Other" responses may be due in part to independent-thinking participants, ambiguities in the questions and answers, or simply lack of obvious or appealing solutions to the situations posed. Even so, a general level of agreement was reached on the following. •

The use of speedup as a parallel performance measure is tolerable, but more than one measure of parallelism effect is desirable in reporting computational results.



Measures of variation and cost-effectiveness are important also.



Report as much of the raw data as possible.



A rule-based strategy should be used when reporting results.

We also feel from the survey and our parallel testing experiences that •

No strong consensus exists regarding the method for summarizing a large body of data.



While many participants wanted all raw data reported, this is clearly not feasible when a large number of problem/code/strategy/processors/instances combinations exist. We must continue to work towards better numerical or graphical methods for summarizing the data.

• • •

Reference values from well-known codes are needed, particularly in the serial case. Even more complications will arise when using speedup as the response variable in a statistical experimental design. We should focus on the real time and cost required to solve difficult problems.

Parallelism holds the promise of permitting operations researchers and computer scientists to reach the previously unattainable: routinely solving problems and supporting models that were too large or complex for previous generations of computers and algorithms. By so doing we bring the benefits of the "OR-approach" (Cook, 1991) to a wider audience, affecting and improving the lives of an increasingly larger portion of the world's populace. The accurate reporting of research is central to progress and the spreading understanding of how to capitalize on these new opportunities so that we might realize the fruits of the promise. ACKNOWLEDGMENTS We wish to thank all of the survey participants, including the following individuals: A. Iqbal AH, Robert E. Bixby, Gordon Bradley, David M. Gay, Harvey J. Greenberg, Terry Harrison, Dick Helgason, Jim Ho, Karla Hoffman, A.H. Rinnooy Kan, Jeff Kennington, Irvin Lustig, Robert Meyer, John M. Mulvey, Stephen G. Nash, Larry Seiford, Don Ratliff, Mike Richey, and Stavros

293 A. Zenios. The remaining respondents wished to remain anonymous. We continue to welcome comments on these topics from all interested parties. REFERENCES Amini, M.M. and R.S. Barr (1990). Network Reoptimization: A Computational Comparison of Algorithmic Alternatives. Technical Report 90-CSE-4, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas. Amini, M.M., R.S. Barr, and R.F. Gunst (1990). An Experimental Design for Comparing Algorithmic Alternatives. Technical Report 90-CSE-5, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas. Barr, R.S. and B.L. Hickman (1989). A New Parallel Network Simplex Algorithm and Implementation for Large Time-Critical Problems. Technical Report 89-CSE-37, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas. Barr, R.S. and B.L. Hickman (1990). A Parallel Approach to Large-Scale Network Models with Side Conditions. Presentation at ORSA/TIMS Philadelphia (forthcoming technical report, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas). Cook, T. (1991). The Challenge of the 90's. Plenary speech at the TIMS/ORSA Joint National Meeting, Nashville. Crowder, H.P., R.S. Dembo, and J.M. Mulvey (1980). On Reporting Computational Experiments with Mathematical Software. ACM Transactions on Mathematical Software, 5,193-203. Flynn, M.J. (1966). Very High-Speed Computing Systems. Proceedings of the IEEE, 54, 1901-1909. GilsinnJ., K. Hoffman, R.H.F. Jackson, E. Leyendecker, P. Saunders, and D. Shier (1977). Methodology and Analysis for Comparing Discrete Linear LI Approximation Codes. Communications in Statistics, 136,399-413. Greenberg, H.J. (1990). Computational Testing: Why, How and How Much. ORSA Journal on Computing, 2 , 7 - 1 1 . Gustafson, J.L., G.R. Montry, and R.E. Benner (1988). Development of Parallel Methods for a 1024-Processor Hypercube. SIAMJ. Sci. Stat. Comput., 9, 609-638. Hickman, B.L. (1991). Parallel Approaches for Pure Network Problems and Related Applications. Doctoral dissertation, Southern Methodist University, Dallas, Texas. Jackson, R.H.F., P.T. Boggs, S.G. Nash, and S. Powell (1990). Report of the Ad Hoc Committee to Revise the Guidelines for Reporting Computational Experiments in Mathematical Programming. Mathematical Programming, 49. Jackson, R.H.F and J.M. Mulvey (1978). A Critical Review of Comparisons of Mathematical Programming Algorithms and Software (1953-1977). / . Research of the National Bureau of Standards, 83. Kennington, J.L. and R.V. Helgason (1980). Algorithms for Network Programming. John Wiley and Sons, New York. Lai, T.H. and S. Sahni (1984). Anomalies in Parallel Branch-and-Bound Algorithms. Communications of the ACM, 27,594-602. Murtagh, B.A. and M.A. Saunders (1987). MINOS 5.1 Users Guide. Report SOL 83-20R, Stanford University. Peters, J. (1990). The Network Simplex Method on a Multiprocessor. Networks, 20, 845-859. Quinn, M.J. (1987). Designing Efficient Algorithms for Parallel Computers. McGraw-Hill, New York.

This page intentionally left blank

OPTIMAL PARALLEL ALGORITHMS FOR COMPUTING A VERTEX OF THE LINEAR TRANSPORTATION POLYTOPE Bruce A. Chalmers* and Selim G. Akl** •Centr e de Recherches pour la Defense, Valcartier C P . 8800, Courcelette, Quebec, Canada ••Departmen t of Computing and Information Science Queen's University, Kingston, Ontario, Canada

ABSTRACT We parallelize the Northwest-Corner Rule for computing a vertex of the polytope of feasible solutions for a linear transportation problem involving m supply depots and n demand depots. The algorithm runs on a CREW PRAM using p < (m + n - 1) processors in 0 ( ( m + n)/p + log(m + n)) time and 0 ( m + n) space. It is optimal when p = 0 ( ( m + n)/log(m + n)). We also show how to remove the concurrent reads to obtain a version of the algorithm for the EREW PRAM that achieves the same asymptotic time and space performance as on the CREW PRAM without incurring the logarithmic factor increase in running time associated with direct simulation.

KEYWORDS Linear transportation problem; polytope of feasible solutions; northwest-comer rule; parallel random access machine; parallel algorithm; concurrent read.

INTRODUCTION The linear transportation problem, also classically known as the Hitchcock transportation problem (Dantzig, 1963), is among the oldest and most widely occurring of all linear programming problems. This problem has been extended to cover many practical situations arising in diverse areas, including personnel assignment, cash flow, inventory control, and weapon assignment problems in defence science. It is described as follows. There are m supply depots containing fixed stockpiles of a certain product which must be transported to n demand depots. A supply vector a = [a^ a ] and a demand vector b = [bp b ] are

m

given, where a. and bj are nonnegative real numbers representing the supply at the i ply depot and the demand at the j

n

sup-

demand depot, respectively, such that Xa- = Zb.. CJ= 295 i j

296 [c.j] is an m 1x n real matrix, where c - is the cost of transporting one unit of the product from the i^ supply depot to the j demand depot. M(a,b) is the bounded convex polyhedron or polytope, called the Hitchcock transportation polytope, in the space of real m x n matrices defined by: M(a,b) = {X = [

X] :i jI x »

i = 1,.... m, Zx^ = bj, j = 1 , n , x - > 0 f o r a l l i, j } .

(1)

Then the linear transportation problem is to determine among all matrices X e M(a, b) one that minimizes the total transportation cost Z Z c - x... This problem and its various special i j J J cases, including the assignment problem or bipartite weighted matching problem, the minimum cost flow problem, and the maximum flow problem, have been studied in numerous texts on linear and network programming (see, for example, (Dantzig, 1963; Murty, 1983; Rockafellar, 1984)), and there is an extensive literature (Bradley et al., 1977) concerned with sequential algorithms derived from the simplex method and the primal-dual method for their solution. As is well-known (Dantzig, 1963), the simplex method requires for its implementation a solution to the following vertex (or extreme point) problem for the transportation polytope M(a, b).

Vertex

Problem

Given: Vectors a, b that determine a nonempty transportation polytope M(a, b). Find: A vertex of M(a, b). The simplex method uses a solution as an initial vertex of a path along the edges of the polytope to the optimal vertex, this path being constructed so that consecutive vertices of the polytope lying on the path have nonincreasing objective function values. In view of the well-known correspondence between vertices and basic feasible solutions and the fact that the rank of the coefficient matrix for the equality constraints in (1) is (m + n - 1) (see, for example, (Dantzig, 1963)), it is clear that the input and output of any algorithm for the vertex problem are: Input:

Output:

Two vectors a = [ap a J , b = [ b ^ , b ] , where each a^, bj is a nonnegative m n J real number, and Za. = Zb-. i j (m + n - 1) triples (i, j , x..), one triple associated with each of the (m + n - 1) basic variables in a basic feasible solution for M(a, b).

Any pair (i, j), i = 1, m, j = 1 , n , lacking a triple in the output corresponds to a nonbasic variable and its X y value is zero. The x.j values that appear in the output are nonnegative reals. Since the basic feasible solution may be degenerate, some of these x..'s may also be zero. In this paper, we present parallel algorithms for the vertex problem. They are designed to run on a parallel random-access machine or PRAM (Akl, 1989). It consists of some number of consecutively numbered identically-programmed processors, each of which is a uniform cost RAM (Aho et al., 1974), operating synchronously under the control of a global clock

297 and communicating via a global or shared memory. During a given time unit a selected number of processors are active and execute the same instruction; the remaining processors (if any) are inactive. Various types of PRAM have been identified, differing in the conventions adopted to handle read/write conflicts which arise when several processors attempt to read from or write into the same location in shared memory. We shall only be concerned with the exclusive-read exclusive-write (EREW)PRAM and the concurrent-read exclusivewrite (CREW)PRAM. An EREW PRAM disallows simultaneous access by more than one processor to the same memory location for read or write purposes. A CREW PRAM allows simultaneous access for reads but not writes. A single read step of a parallel algorithm using p processors may require k £ p shared memory locations to be accessed by k disjoint subsets of the p processors. On a CREW PRAM this takes 0 ( 1 ) time. On an EREW PRAM, however, it is well-known that a direct simulation of the read step runs in 0 ( l o g p) time, using 0 ( p ) additional space (the base of all logarithms in this paper is 2). The work (or cost) of a parallel algorithm is defined as the product of the parallel running time and the number of processors used (Akl, 1989; Leighton, 1992). If the work matches a known lower bound on the number of sequential steps required in the worst case to solve the problem up to a constant multiplicative factor, then the parallel algorithm is said to be optimal.

k for the tranAn interesting consequence of our parallel algorithms is that the vertex problem sportation polytope belongs to NC, the class of problems solvable in O 0 o g N) time using 6(N*) processors, where k and j are positive constants independent of the size N of the problem instance. This is used in (Chalmers, 1989) to identify a subclass of linear transportation problems in NC that generalize Gilmore-Gomory matching (Lawler, 1976). Our parallel algorithms for the vertex problem make use of parallel solutions to the following multiple search problem.

Multiple Search Input:

Output:

Problem

Two vectors r = [r^ r ] , s = [Sj, s ] , where the r ' s and s.'s are real numbers k satisfying ^ ^ (i '

J2>'

•••



^ M N '-

W

"

»

&

*•

k = 1 , m n , the procedure constructs a vertex of M(a, b) as follows:



' '

SUPPLIES

X

X

X 11 X 21

X

X ml

DEMANDS

b

b l

A

X 12

l

In X

22

A 2n

A

X m2

2

mn

m

b 2

n

Fig. 1 Transportation tableau for computing a vertex of the transportation polytope. procedure GREEDY (a, b, 0 , X) for k = 1 to mn do x. .