Introduction to algorithms
 9781773616179, 9781773612188

Table of contents :
Cover
Half Page Title
Page Title
Copyright Page
About the Author
Table of Contents
List of Figures
List of Tables
Preface
Chapter 1 Fundamentals of Algorithms
1.1. Introduction
1.2. Different Types of Problems Solved by Algorithms
1.3. Data Structures
1.4. Algorithms as a Technology
1.5. Algorithms and Other Technologies
1.6. Getting Started With Algorithm
1.7. Analyzing Algorithms
References
Chapter 2 Classification of Algorithms
2.1. Introduction
2.2. Deterministic And Randomized Algorithms
2.3. Online vs. Offline Algorithms
2.4. Exact, Approximate, Heuristic, And Operational Algorithms
2.5. Classification According To The Main Concept
References
Chapter 3 An Introduction to Heuristic Algorithms
3.1. Introduction
3.2. Algorithms And Complexity
3.3. Heuristic Techniques
3.4. Evolutionary Algorithms
3.5. Support Vector Machines
3.6. Current Trends
References
Chapter 4 Types of Machine Learning Algorithms
4.1. Introduction
4.2. Supervised Learning Approach
4.3. Unsupervised Learning
4.4. Algorithm Types
References
Chapter 5 Approximation Algorithms
5.1. Introduction
5.2. Approximation Strategies
5.3. The Greedy Method
5.4. Sequential Algorithms
5.5. Randomization
5.6. A Tour Of Approximation Classes
5.7. Brief Introduction To PCPs
5.8. Promising Application Areas For Approximation and Randomized Algorithms
5.9. Tricks Of The Trade
References
Chapter 6 Comparative Investigation Of Exact Algorithms For 2D Strip Packing Problem
6.1. Introduction
6.2. The Upper Bound
6.3. Lower Bounds For 2SP
6.4. A Greedy Heuristic For Solving The 2D Knapsack Problems
6.5. The Branch And Price Algorithms
6.6. The Dichotomous Algorithm
6.7. Computational Results
References
Chapter 7 Governance of Algorithms
7.1. Introduction
7.2. Analytical Framework
7.3. Governance Options By Risks
References
Chapter 8 Limitations of Algorithmic Governance Options
8.1. Introduction
8.2. Limitations of Market Solutions And Self-Help Strategies
8.3. Limitations of Self-Regulation And Self-Organization
8.4. Limitations of State Intervention
References
Index

Citation preview

INTRODUCTION TO ALGORITHMS

INTRODUCTION TO ALGORITHMS

Rex Porbasas Flejoles

ARCLER

P

r

e

s

s

www.arclerpress.com

Introduction To Algorithms Rex Porbasas Flejoles

Arcler Press 2010 Winston Park Drive, 2nd Floor Oakville, ON L6H 5R7 Canada www.arclerpress.com Tel: 001-289-291-7705         001-905-616-2116 Fax: 001-289-291-7601 Email: [email protected] e-book Edition 2019 ISBN: 978-1-77361-617-9 (e-book) This book contains information obtained from highly regarded resources. Reprinted material sources are indicated and copyright remains with the original owners. Copyright for images and other graphics remains with the original owners as indicated. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data. Authors or Editors or Publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The authors or editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify.

Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement. © 2019 Arcler Press ISBN: 978-1-77361-218-8 (Hardcover) Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com

ABOUT THE AUTHOR

Rex P. Flejoles is an IT Faculty member of a state university in the Philippines. He is involved in software development, research, and data analysis. He graduated Bachelor of Science in Computer Engineering in 2002. Moreover, he received Master of Science in Computer Science degree in 2006. In addition, he completed the academic requirements for PhD in Science Education major in Mathematics Education. He has completed a number of research studies and served as adviser and statistician to various undergraduate research studies.

TABLE OF CONTENTS



List of Figures.................................................................................................xi



List of Tables.................................................................................................xiii

Preface..........................................................................................................xv Chapter 1

Fundamentals of Algorithms....................................................................... 1 1.1. Introduction......................................................................................... 2 1.2. Different Types of Problems Solved by Algorithms................................ 4 1.3. Data Structures.................................................................................... 8 1.4. Algorithms as a Technology.................................................................. 9 1.5. Algorithms and Other Technologies................................................... 11 1.6. Getting Started With Algorithm.......................................................... 13 1.7. Analyzing Algorithms......................................................................... 18 References................................................................................................ 25

Chapter 2

Classification of Algorithms..................................................................... 43 2.1. Introduction....................................................................................... 44 2.2. Deterministic And Randomized Algorithms....................................... 45 2.3. Online vs. Offline Algorithms............................................................ 46 2.4. Exact, Approximate, Heuristic, And Operational Algorithms.............. 46 2.5. Classification According To The Main Concept................................... 47 References................................................................................................ 55

Chapter 3

An Introduction to Heuristic Algorithms.................................................. 61 3.1. Introduction....................................................................................... 62 3.2. Algorithms And Complexity............................................................... 63 3.3. Heuristic Techniques.......................................................................... 65 3.4. Evolutionary Algorithms..................................................................... 66 3.5. Support Vector Machines................................................................... 68 3.6. Current Trends................................................................................... 70

References................................................................................................ 72 Chapter 4

Types of Machine Learning Algorithms.................................................... 79 4.1. Introduction....................................................................................... 80 4.2. Supervised Learning Approach........................................................... 81 4.3. Unsupervised Learning...................................................................... 84 4.4. Algorithm Types................................................................................. 86 References.............................................................................................. 113

Chapter 5

Approximation Algorithms..................................................................... 127 5.1. Introduction..................................................................................... 128 5.2. Approximation Strategies................................................................. 132 5.3. The Greedy Method......................................................................... 135 5.4. Sequential Algorithms...................................................................... 141 5.5. Randomization................................................................................ 144 5.6. A Tour Of Approximation Classes..................................................... 146 5.7. Brief Introduction To PCPs............................................................... 149 5.8. Promising Application Areas For Approximation and Randomized Algorithms......................................................... 150 5.9. Tricks Of The Trade........................................................................... 151 References.............................................................................................. 153

Chapter 6

Comparative Investigation Of Exact Algorithms For 2D Strip Packing Problem.................................................................................... 165 6.1. Introduction .................................................................................... 166 6.2. The Upper Bound ........................................................................... 167 6.3. Lower Bounds For 2SP..................................................................... 168 6.4. A Greedy Heuristic For Solving The 2D Knapsack Problems............. 173 6.5. The Branch And Price Algorithms..................................................... 175 6.6. The Dichotomous Algorithm............................................................ 176 6.7. Computational Results..................................................................... 177 References.............................................................................................. 187

Chapter 7

Governance of Algorithms..................................................................... 201 7.1. Introduction..................................................................................... 202 7.2. Analytical Framework...................................................................... 204 7.3. Governance Options By Risks.......................................................... 206

viii

References.............................................................................................. 211 Chapter 8

Limitations of Algorithmic Governance Options.................................... 221 8.1. Introduction..................................................................................... 222 8.2. Limitations of Market Solutions And Self-Help Strategies.................. 222 8.3. Limitations of Self-Regulation And Self-Organization....................... 224 8.4. Limitations of State Intervention....................................................... 225 References.............................................................................................. 226

Index...................................................................................................... 231

ix

LIST OF FIGURES Figure 1.1. Basic layout and outline of algorithms. Figure 1.2. A characteristic example of algorithm application. Figure 1.3. Sorting a hand of cards by making use of insertion sort. Figure 1.4. The Insertion-Sort operation on the array A which is equal to (2, 5, 4, 6, 1, 3). Indices of array appear above the rectangles; whereas, values that are stored in the array positions lie within the rectangles. (a)–(d) the reiterations of the ‘for loop’ of lines 1–8. During each iteration, the rectangle (black in color) holds the key taken from A (j), which is then equated with the values inside shaded rectangles to its left in the test of line 5. Shaded arrows exhibit values of array relocated one position to the right in line 6, and black arrows signify where the key moves to in line 8. (e) The ultimate sorted array. Figure 2.1. Major types of data structure algorithms. Figure 3.1. Comparison between conventional algorithms and heuristic algorithms. Figure 3.2. Illustration of the classification problem. Figure 4.1. Illustration of supervised learning and unsupervised learning systems. Figure 4.2. Schematic illustration of machine learning supervises the procedure. Figure 4.3. The demonstration of SVM analysis for determining 1D hyperplane (i.e., line), which differentiates the cases because of their target categories. Figure 4.4. Illustration of an SVM Analyses containing dual-category target variables possessing two predictor variables having the likelihood of the division for point clusters. Figure 4.5. Schematic illustration of K-means iteration. Figure 4.6. Demonstration of the motion for m1 and m2 means at the midpoint of two clusters. Figure 4.7. Graph showing a polynomial sample with a high order. Figure 4.8. Illustration of sample data. Figure 4.9. The depiction of the 2D array weight of a vector. xi

Figure 4.10. Illustration of a sample SOM algorithm. Figure 4.11. Demonstration of Weight Values. Figure 4.12. A graph demonstrating the determination of SOM neighbor. Figure 4.13. Display of SOM Iterations. Figure 4.14. A sample of weight allocation in colors. Figure 5.1. Schematic Illustration of a mechanism for approximation algorithms. Figure 5.2. Approximation route for an approximation algorithm problem. Figure 5.3. A sketch of a complete bipartite graph with n nodes colored red and n nodes colored blue. Figure 6.1. Available rectangles and associated available points. Figure 6.2. Different classes of items. Figure 6.3. Illustration of the envelope and corner points. Figure 6.4. Description of a PSHF algorithm. Figure 6.5. The optimal solution for the instance SCP16.

LIST OF TABLES

Table 6.1. Results Acquired from Three Algorithms for Hifi Instances Table 6.2. Results Acquired from the Three Algorithms for Martello et al. Instances Table 6.3. Data of the Instance SCP16 Table 6.4. Results Acquired by Branch and Bound Algorithm and Dichotomous Algorithm on Random Instances Table 7.1. Illustration of Different Algorithm Types and their Examples

PREFACE

This book offers a detailed overview of the modern studies of computer algorithms. A detailed description of different types of computer algorithms is discussed in considerable depth in the book. The content on design and analysis of algorithms is intended for a wide range of readers. The basic knowledge of programming language and basic mathematics is essential for carrying out algorithmic analyses. Knowledge of algorithms allows us to shift towards a challenging task of solving a certain problem, instead of merely engaging in the technical characteristics of instructing a computer to carry out a particular task. The objective of this book on algorithms and data structures is to start familiarizing the readers with the theoretical aspect of the skills of developing computer programs and algorithms. This book is an initiative towards getting familiar with a range of similar fields which include algorithmic complexity and computability which should be learned in conjunction with improving applied programming skills. The book typically consists of self-contained data with a thorough comprehension of fundamental programming and mathematical tools. Basics of the book are reliant on the introduction of algorithms and data structures with respect to different algorithmic problems. The book discusses different types of algorithms used for solving problems. I believe in associate learning which entails the association of one topic with every other topic, i.e., one topic leads to another and so on. This book contains a topic which is closely linked to each other. There was no intention to formulate a comprehensive compilation of all there is to know about algorithms but rather to offer a collection of basic ingredients and key building blocks, on which foundations of algorithms could rely. Familiarization of fundamentals principles of algorithms and data structures is essential for understanding the algorithmic problems. The first chapter of the book contains a detailed introduction of the fundamental principles of algorithms and data structures. There are various types of algorithms which are being investigated nowadays. Chapter 2 deals with the classification of different types of algorithms. Presently, extensive developments are being carried out in the field of Machine learning and heuristic systems. Chapters 3 and 4

contain a detailed presentation of advanced algorithms such as heuristic and machine learning algorithms. Different types of machine learning and heuristic algorithms are also discussed in Chapters 3 and 4. On the other hand, Chapter 5 contains fundamental principles of approximation algorithms and their types. Many algorithms are employed in the real-time world to solve different kinds of problems. For instance, exact algorithms are used to investigate the problems of 2D strip packing, as discussed in Chapter 6. Every field of science and technology is governed by some rules. Similarly, algorithmic systems are also governed by certain rules and regulations. Chapters 7 and 8 discuss the analytical framework of algorithmic governance and the potential risks and limitations associated with the algorithms and their applicability. Readers of this book are expected to learn about established solutions to solve the problems efficiently. They will learn some progressive data structures and innovative ways to apply data structures to enhance the efficacy of algorithms. The book is essentially self-contained which is equally appropriate to be treated as a course book, reference book or self-study material.

xvi

CHAPTER

1

FUNDAMENTALS OF ALGORITHMS

CONTENTS 1.1. Introduction......................................................................................... 2 1.2. Different Types of Problems Solved by Algorithms................................ 4 1.3. Data Structures.................................................................................... 8 1.4. Algorithms as a Technology.................................................................. 9 1.5. Algorithms and Other Technologies................................................... 11 1.6. Getting Started With Algorithm.......................................................... 13 1.7. Analyzing Algorithms......................................................................... 18 References................................................................................................ 25

2

Introduction To Algorithms

1.1. INTRODUCTION We define an algorithm as any well-defined computational method that takes input in the form of any value, or else set of values, and resultantly produces any value, or else set of values, as output. Hence, an algorithm is considered a sequence of computational steps which transforms any input into the output. Another way of seeing algorithm is to consider it as a tool that solves a well-specified computational problem (Aggarwal & Vitter, 1988; Agrawal et al., 2004). Generally, the statement of the problem postulates the desired relationship between input and output; whereas, the algorithm defines a particular computational process for achieving that relationship between input and output. For instance, suppose that we want to sort a sequence of numbers into an increasing order. This problem arises quite often in practice and makes available a fertile ground for presenting numerous standard design practices as well as analysis tools (Adel’son-Vel’skii & Landis, 1962; Abramowitz & Stegun, 1965). Given below is the way we formally define any sorting problem (Figure 1.1): Input: An arrangement of n numbers (a1, a2, …, an). Output: A rearrangement (permutation) < a1' , a2' ,..., an' > of the input sequence in such a manner that a1' ≤ a2' ≤  ≤ an' . As an example, consider an input sequence (30, 35, 59, 26, 35, 57). The output obtained from a sorting algorithm will return (26, 30, 35, 35, 57, 59) sequence. This type of input sequence is known as an instance of the sorting problem. Generally, an instance of a problem contains an input (satisfying whatsoever constraints are enforced in the problem statement), which is required to compute a solution of the encountered problem (Aho & Hopcroft, 1974; Aho et al., 1983). Sorting is regarded as a fundamental step in computer science as numerous programs makes it useful as an intermediate step. Resultantly, we have a large number of excellent sorting algorithms at our disposal (Ahuja & Orlin, 1989; Ahuja et al., 1990; 1993). The choice of a best algorithm for any given application depends mainly on several factors such as the number of items which need to be sorted, the degree to which the items have already been slightly sorted, the architecture of the computer, potential limitations on the item values, as well as the type of storage devices to be employed: main memory, tapes or disks.

Fundamentals of Algorithms

3

An algorithm is considered as correct given that it halts with the correct output for each of the input instance. We say that an algorithm which is correct solves the specified computational problem. On the other hand, an algorithm which is incorrect will possibly not halt at all on some of the input instances, or it may halt with a wrong answer (Ahuja et al., 1989; Courcoubetis et al., 1992). Opposing to our expectation, incorrect algorithms can sometimes prove beneficial, provided we can control their rate of error. However, in general, we will only be concerned with the correct algorithms (Szymanski & Van Wyk, 1983; King, 1995; Didier, 2009).

Figure 1.1: Basic layout and outline of algorithms (Source: http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=IntroToAlgorithms).

We can specify an algorithm in English, in the form of a computer program, or even by way of a hardware design. The single requirement is that the problem specification should necessarily provide a detailed account of the computational practice to be followed (Snyder, 1984; Salisbury et al., 1996). The characteristics of an algorithm are stated below: i. ii. iii.

Each and every single instruction must be precise and clear, that is, each and every instruction should have only one meaning. Each and every instruction must be performed the infinite amount of time. There should not be any infinite repetition of one or more than one instructions which means that the algorithm should necessarily terminate ultimately.

Introduction To Algorithms

4

iv.

After the execution of the instructions, the user should acquire the desired results.

1.2. DIFFERENT TYPES OF PROBLEMS SOLVED BY ALGORITHMS Sorting is not the only computational problem for which scientists have developed algorithms. There are abundant practical applications of algorithms existing worldwide. Few of such examples are stated below (Figure 1.2): i.

Over the past years, the Human Genome Project has made excessive development toward the aim of classifying all the 100,000 genes present in human DNA, defining the sequences of the three billion chemical base pairs which compose human DNA, stowage of this huge amount of information in databases, as well as developing tools for analysis of data. Each of the abovementioned steps needs sophisticated algorithms. Even though the solutions of the several problems involved are outside the scope of this book, several methods to find the solution of these biological problems employ ideas from various sections in this book, hence enabling scientists to achieve tasks while utilizing resources in an efficient manner (Aki, 1989; Akra & Bazzi, 1998). The savings are in time, both machine and human, as well as in money since we can extract more information from laboratory techniques (Regli, 1992; Ajtai et al., 2001). ii. Worldwide, the Internet is enabling people to rapidly access as well as retrieve a large quantity of information. With the help of intelligent algorithms, different sites on the Internet manage and use this large amount of data. Few examples of problems that essentially make use of different algorithms include discovering good routes on which the data can travel, and employing a search engine which can speedily find pages on which our specific required information is present (Alon, 1990; andersson, 1995; 1996). iii. Electronic commerce facilitates us to negotiate as well as electronically exchange services and goods. It relies on the privacy of personal information like passwords, credit card numbers, and bank statements. The fundamental technologies that are employed in electronic commerce include digital signatures and public-key

Fundamentals of Algorithms

iv.

5

cryptography. These technologies are based on number theory and numerical algorithms (Andersson & Thorup, 2000; Bakker et al., 2012). Every so often, manufacturing and various other commercial enterprises require allocation of rare resources in the most favorable way. For instance, an oil company might wish to identify as to where it should place its wells for maximizing its anticipated profit. An airline might hope to assign crews to flights in the most inexpensive manner possible while making sure to cover each flight and that government regulation concerning the scheduling of the crew are fulfilled (Ramadge & Wonham, 1989; Amir et al., 2006). Some political candidate might need to decide where to spend his money ordering campaign advertising for maximizing the likelihood of winning an election. An ISP (Internet service provider) might want to know the place where they can place additional resources so as to serve their customers in a more effective manner. All of the above-mentioned examples constitute problems that which can be solved by employing linear programming (Dengiz et al., 1997; Berger & Barkaoui, 2004).

Figure 1.2: A characteristic example of algorithm application. (Source: https:// invisiblecomputer.wonderhowto.com/how-to/coding-fundamentals-introduction-data-structures-and-algorithms-0135613/).

6

Introduction To Algorithms

Even though few of the details of the above-mentioned examples are outside the scope of this book, we will just give underlying techniques which are applicable to these problems and problem areas. We will also see how to solve numerous specific problems, including the ones stated below: i. Suppose we are provided with a road map containing the distance marked between each pair of adjacent intersections, and we desire to find the shortest route possible from one intersection to another. There can be a huge number of possible routes, even if we prohibit routes crossing over themselves (Festa & Resende, 2002; Zhu & Wilhelm, 2006). Now, in this situation, how to choose the shortest of all the possible routes? In such case, we model the roadmap as a graph and then wish to find the shortest possible route from one vertex to another in the graph. ii. Assume that we are provided with two methodical sequences of symbols like, X= (x1, x2 …, xm) and Y= (y1, y2… yn), and we would like to discover the longest communal subsequence of Y and X. A subsequence of X will contain X minus some (or perhaps all or none) of its elements. For instance, one subsequence of (A, B, C, D, E, F, G, H, I) can be (B, C, E, G). The span of the longest communal subsequence of X and Y gives us the extent of how alike these two sequences are. For instance, if the under consideration two sequences are base pairs in strands of DNA, then we may want to consider them alike given that they have a long shared subsequence. If X contains m symbols and Y contains n symbols, then X and Y will have 2m and 2n conceivable subsequences, respectively. Choosing all conceivable subsequences of X and Y and afterward matching them up may possibly take excessively long time except if both m and n are very small (Wallace et al., 2004; Wang, 2008). iii. We are provided with a mechanical design with regard to a library of parts. Each part might include instances of some other parts, and we are required to list the parts in sequence so that each part comes before any other part that uses it (Maurer, 1985; Smith, 1986). If there are n parts of the design, then there will be nŠ potential orders, where nŠ signifies the factorial function. Since the factorial function raises quicker as compared to even an exponential function, thus, it is not feasible for us to generate each possible order and then carry out is verification within that order such that each part comes before the parts using it. Such a

Fundamentals of Algorithms

7

problem is an instance of topological sorting (Herr, 1980; Stock & Watson, 2001). iv. Assume that we are provided with n points in the plane, and we desire to discover the convex hull of these points which is the smallest convex polygon enclosing the points. Instinctively, we can consider each point as being characterized by a nail piercing out from a cardboard. The convex hull then will be signified by a tight rubber band surrounding all the nails (Arora et al., 1998; 2012). Each of the nails about which the rubber band creates a turn is a vertex of the convex hull. Hence, any of the 2n subsets of the points can be the vertices of the convex hull. It is not enough here to know the points that are the vertices of the convex hull, as we also must know the order of their appearance. Hence, several different choices are available for the vertices of the convex hull (Price, 1973; Berry & Howls, 2012). These lists exhibit two features that are common to numerous exciting algorithmic problems: i.

They have a number of candidate solutions, the vast majority of which are unable to solve the problem at hand. To find the one that solves the problem or the one that is “best,” can be quite challenging (Arora, 1994; 1998; Arora & Howls, 2012). ii. Most of them have practical applications. The easiest example from the problems listed above was finding the shortest path. Any transportation company, such as a railroad or a trucking company, has a financial interest to find the shortest possible path through a rail network or a road since the adoption of shorter paths outcomes in lower labor and fuel prices (Bellare & Sudan, 1994; Friedl & Sudan, 1995). Similarly, a routing node on the Internet may desire to discover the shortest path through the network for quick routing of a message. Or else a person who wishes to drive from Washington to Chicago might want to know driving directions from a suitable Website or may use his/her GPS during driving (Sudan, 1992; Arora & Lund, 1996). iii. It is not necessary that every problem which is solved by algorithms would have an easily identified set of contestant solutions. For instance, assume that we are provided with a set of numerical values expressing samples of a signal. Now we wish to calculate the discrete Fourier transform (DFT) of these samples

8

Introduction To Algorithms

(Fredman & Willard, 1993; Raman, 1996). DFT by generating a set of numerical coefficients, converts the time domain into the frequency domain, such that we can define the strength of different frequencies in the sampled signal. Besides lying at the heart of signal processing, there are numerous applications of discrete Fourier transforms in data compression as well as multiplication of large polynomials and integers (Polishchuk & Spielman, 1994; Thorup, 1997).

1.3. DATA STRUCTURES A data structure is a method of storing and organizing data in order to ease access and modifications. There is no single data structure that works equally well for all purposes, and hence it is imperative to know the strengths as well as limitations of a number of them (Brodnik et al., 1997.

1.3.1. Hard Problems Talking about efficient algorithms, our typical measure of efficiency is speed, that is, how long it takes for an algorithm to output its result. However, at times, there can be few problems for which there does not exist any efficient solution. NP (nondeterministic polynomial time)-complete problems are considered interesting because of two reasons (Phillips & Westbrook, 1993; Ciurea & Ciupala, 2001). First, even though there has not been found any effective algorithm for an NP-complete problem, no one person has ever proved that an effective algorithm for such problem cannot exist. We can say that no one knows whether or not there are any efficient algorithms present for NP-complete problems. Secondly, there is a remarkable property of a set of NP-complete problems that if any effective algorithm exists for any one of them, then such an efficient algorithm will exist for all of them (Cheriyan & Hagerup, 1989; 1995). This relationship between the NP-complete problems results in the lack of effectual solutions all the more tormenting. Thirdly, there are several similar NP-complete problems but they are not identical to problems for which we have efficient algorithms. Computer experts are enthralled by how a minor change in the problem statement can lead to a big change in the proficiency of the best-known algorithm (Cheriyan et al., 1990; 1996). One should be familiar with NP-complete problems as some of them unexpectedly arise quite often in real applications. Given that you are called upon to create an efficient algorithm for any NP-complete problem, it is

Fundamentals of Algorithms

9

likely that you will spend a lot of time in a futile search. On the other hand, if you can prove that the problem is NP-complete, then instead you can spend your time to develop an efficient algorithm which provides a good, nevertheless not the best possible, solution (Leighton, 1996; Roura, 2001; Drmota & Szpankowski, 2013). As an existing example, assume a delivery firm with a central depot. Each day, its delivery trucks are loaded at the depot and then send around for delivering goods to various addresses. By the end of the day, each truck is required to reach back at the depot so that it can be made ready for loading for the next day (Bentley et al., 1980; Chan, 2000; Yap, 2011). In order to decrease the prices, the company wishes to choose a sequence of delivery stops which results in the lowest overall distance covered by each truck. This problem is the popular “traveling-salesman problem,” and it comes in the NP-complete category. It does not have an efficient algorithm. However, under some certain assumptions, we can know of such efficient algorithms which provides an overall distance not too far above the shortest possible (Shen & Marston, 1995; Verma, 1997).

1.3.2. Parallelism For several years, we had been able to rely upon processor clock speeds growing at a steady rate. Even though physical limitations cause an ultimate roadblock to ever-growing clock speeds, nevertheless: as with the speed of clock, power density rises super-linearly, hence, chips run the hazard of melting after their clock speeds get plenty high (Meijer & Akl, S1987; 1988). Therefore, for performing more computations per second, we design chips to contain several processing cores. We can compare these multicore computers to numerous sequential computers on a single chip; or else we can say that they are a type of “parallel computer.” For the purpose of stimulating the best possible performance from multicore computers, we must design algorithms keeping parallelism in mind. Algorithms such as “multithreaded” algorithms take benefit of multiple cores. This type of model has several advantages from a theoretical viewpoint, and thus it forms the basis of numerous successful computer programs (Wilf, 1984; Selim & Meijer, 1987).

1.4. ALGORITHMS AS A TECHNOLOGY Let’s assume that computers were substantially fast and their memory was free. Can you think of any reason to study algorithms then? The answer is still

10

Introduction To Algorithms

yes, as we would still want to demonstrate that our solution terminates and does so with the accurate answer. Given that computers were extremely fast, any accurate method for problem-solving will terminate (Igarashi et al., 1987; Park & Oldfield, 1993). You most likely would want your implementation of software to lie within the bounds of decent software engineering practice (for instance, your implementation must be well designed as well as well documented), but quite often you would be likely using whatever method is the easiest to implement (Glasser & Austin Barron, 1983; Immorlica et al., 2006). Obviously, computers can be fast, but they cannot be infinitely fast. Likewise, their memory may be low-priced, but it is not entirely free. Hence, computing time is a bounded resource, likewise is space in memory. One should utilize these resources wisely, and algorithms that are efficient with regard to space or time can help us do so (Wilson, 1991; Kleinberg, 2005; Buchbinder et al., 2010). Different algorithms that are written for the solution of the same problem frequently differ widely in terms of their efficiency. Such differences can be much more noteworthy as compared to the differences owing to software and hardware (Vanderbei, 1980; Babaioff et al., 2007; Bateni et al., 2010). Take an example of two different sorting algorithms for more clarity. The first is known as insertion sort and the time it takes is approximately equal to c1n2 to sort n items; here c1 is a constant which is not dependent on n; which means that it takes time approximately proportional to n2. The second algorithm is known as merge sort and it takes time approximately equal to c2n lg n, here lg n stands for log2 n whereas c2 is another constant that too is not dependent on n. As compared to merge sort, Insertion sort has normally a smaller constant factor, such that c1 < c2. We will find out that the impact of constant factors can be far less on the running time than the dependency on the input size n (Aslam et al., 2010; Du & Atallah, 2001; Eppstein et al., 2005). If we write insertion sort’s running time as c1n n and running time of merge sort algorithm as c2n lg n, then we will find out that where the factor of insertion sort is n in its running time, merge sort is having a factor of lg n, which is obviously much smaller. Even though insertion sort generally runs more rapidly than merge sort for trivial input sizes, nevertheless, after the input size n becomes sufficiently large, merge sort’s benefit of lg n versus. n will compensate fairly for the difference in constant factors. Regardless of how much smaller c1 is compared to c2, there will always exist a crossover point outside which merge sort is faster (Sardelis &

Fundamentals of Algorithms

11

Valahas, 1999; Dickerson et al., 2003). For a real example, let us consider a faster computer (name it as computer A) running insertion sort alongside a slower computer (name it as computer B) running merge sort. Each of them must sort an array containing 10 million numbers. (Even though it may seem that 10 million numbers are a lot, given that the numbers are eight-byte integers, then the input will be occupying around 80 megabytes, which can fit, numerous times over, in the memory of even a low-cost laptop computer). Let’s assume that computer A is executing 10 billion instructions per second; whereas, computer B is executing merely 10 million instructions per second, hence, it makes computer A 1000 times quicker than computer B in terms of raw computing power (Sion et al., 2002; Goodrich et al., 2005). To amplify this dramatic difference, assume that the world’s most devious programmer codes insertion sort for computer A in machine language, and the resultant code needs 2n2 instructions to sort n numbers. Additionally, assume that an average programmer implements merge sort by employing a high-level language using an inefficient compiler, and the resultant code takes 50n lg n instructions. Now, for sorting 10 million numbers, computer A will take:

2 ⋅ (107 ) 2 instructions = 20, 000sec ond (more than 5.5 hours ) 1010 instructions / sec ond Whereas, computer B will take:

50 ⋅107 lg107 instructions ≈ 1163sec ond (less than 20 min utes ). 1010 instructions / sec ond Even with a poor compiler, by making use of an algorithm whose running time raises more slowly, computer B will be running more than 17 times faster as compared to computer A. The benefit of merge sort is even more noticeable when we sort 100 million numbers: in this case insertion sort will be taking more than 23 days, whereas, merge sort will only take under four hours. As a general rule, as the size of the problem increases, so does the relative benefit of merge sorting (Eppstein et al., 2008; Acar, 2009).

1.5. ALGORITHMS AND OTHER TECHNOLOGIES The example stated above proves that just like computer hardware, we should consider algorithms, as a technology. The entire performance of the system depends on picking efficient algorithms as much as on the selection of fast hardware. Just like rapid progress is being made in several computer

12

Introduction To Algorithms

technologies, likewise, they are also being made in algorithms (Amtoft et al., 2002; Ausiello et al., 2012). You might be curious if algorithms are truly that significant on contemporary computers taking into account other advanced technologies, for example: i. Intuitive, easy-to-use, graphical user interfaces (GUIs). ii. Advanced computer architectures and production technologies. iii. Fast networking, both wireless and wired. iv. Object-oriented systems. v. Integrated Web technologies. The answer is yes. Even though few applications do not explicitly need algorithmic content at the application level (like some small, webbased applications), several others do. For instance, suppose a Web-based service which defines how to travel from one location to another location (Szymanski, 1975; Aslam et al., 2010). The implementation of this service would depend on a graphical user interface, fast hardware, wide-area networking, and quite probably on object orientation. Though it will also need algorithms for certain operations like discovering routes (probably making use of a shortest-path algorithm), interpolating addresses, and rendering maps (Bach, 1990; Avidan & Shamir, 2007). Furthermore, even an application which, at the application level, does not need algorithmic content is highly dependent upon algorithms. Since the application is reliant on fast hardware and the design hardware uses algorithms. Furthermore, the application also relies on graphical user interfaces whose design again relies on algorithms (Winograd, 1970; Bach & Shallit, 1996). Additionally, the application can also rely on networking and route in networks also relies mainly on algorithms. If the application is written in any language other than the machine language, then it must be processed using an interpreter, a compiler, or assembler, all of these extensively uses different algorithms. Hence, we can conclude that algorithms lie at the core of most technologies that are employed in contemporary computers (Bailey et al., 1991; Baswana et al., 2002). Moreover, with the ever-growing capabilities of computers, we utilize them for solving complex problems than ever before. As it has been seen from the comparison above between merge sort and insertion sort, the differences in efficiency among algorithms become mostly more prominent in the larger problem (Bayer, 1972; Arge, 2001; Graefe, 2011). Being equipped with a solid base of algorithmic technique and knowledge is one distinctive feature

Fundamentals of Algorithms

13

that sets apart the skilled programmers from the beginners. Although with the help of modern computing technology, we can achieve few tasks without having to know much about algorithms, however, with a sound background in algorithms, we can do much, much more (Blasgen et al., 1977; Harth et al., 2007).

1.6. GETTING STARTED WITH ALGORITHM We start by examination of the insertion sort algorithm for solving the sorting problem presented in earlier sections. We will define a “pseudo code” which should be known to you provided you have knowledge of computer programming. We will be using it to demonstrate how to specify our algorithms. After specification of the insertion sort algorithm, we then reason that it sorts appropriately, and afterward we analyze its running time (Kwong & Wood, 1982; Icking et al., 1987). The resultant analysis brings forth a notation that emphases on how that time rises by increasing the number of items that need to be sorted. Subsequent to our discussion of insertion sort, we will introduce the divide-and-conquer method for designing of algorithms and employ it for developing an algorithm known as merge sort. Afterward, we will be ending with an analysis of the running time of merge sort (Nievergelt, 1974; Beauchemin et al., 1988).

1.6.1. Insertion Sort The first algorithm, insertion sort, solve the sorting problem presented in the previous sections of this chapter: Input: A series of n numbers (a1, a2…, an). Output: A rearrangement (permutation) < a1' , a2' ,..., an' > of the input sequence in such a manner that a1' ≤ a2' ≤  ≤ an' The numbers that are to be sorted are also called the keys. Even though we are sorting a sequence from a theoretical point of view, the input approaches us in the form of an array having n elements. Here we will be typically describing algorithms as programs that are written in a pseudo code which in many respects is analogous to C++, C, Pascal, Java, or Python. If you have already been familiar with any of these languages then you should have little distress reading our algorithms (Monier, 1980; Kim & Pomerance, 1989; Damgård et al., 1993) (Figure 1.3).

14

Introduction To Algorithms

Figure 1.3: Sorting a hand of cards by making use of insertion sort. (Source: https://mitpress.mit.edu/books/introduction-algorithms-third-edition).

The difference between real code and pseudocode is that in pseudocode, we make use of whatever expressive method is most brief and clear to stipulate a given algorithm. Occasionally, the clearest method is English, so it shouldn’t come as a surprise if one comes across a phrase in English or any English sentence embedded inside a section of “real” code (BenOr, 1983; Bender et al., 2000). One more difference between real code and pseudocode is that pseudocode is not normally concerned with problems of software engineering. Issues of modularity, data abstraction, and error handling are frequently ignored for the purpose of conveying the spirit of the algorithm in a more concise manner (Wunderlich, 1983; Morain, 1988). We will start with insertion sort. Insertion sort is an efficient algorithm to sort out a small figure of elements. We can relate the working of insertion sort as the way various people sort a hand of playing cards. Starting with an empty left hand, the cards will face down on the table. After that, we take away one card from the table at a time and place it in the right position in the left hand (Fussenegger & Gabow, 1979; John, 1988). For finding the right position for a card, we make its comparison with each of the already held cards in the hand, from right to left, as shown in the figure below. At all instances, the cards seized in the left hand are sorted. Further, these were the cards that were originally on the top of the pile on the table (Kirkpatrick, 1981; Munro & Poblete, 1982; Bent & John, 1985). Our pseudocode for insertion sort is presented as a procedure known as INSERTION-SORT, which takes an array as a parameter (A [1 … n). This array contains a sequence of length n that needs to be sorted. The algorithm

Fundamentals of Algorithms

15

then sorts the input array in place: it reorders the numbers present within the array A, with a constant number of them stored outside the array as a maximum at any time. The input array A has the sorted output sequence after the procedure of Insertion-Sort is finished (Schönhage et al., 1976; Bentley et al., 1980; Bienstock, 2008) (Figure 1.4).

Figure 1.4: The insertion-sort operation on the array A which is equal to (2, 5, 4, 6, 1, 3). Indices of array appear above the rectangles; whereas, values that are stored in the array positions lie within the rectangles. (a)–(d) the reiterations of the ‘for loop’ of lines 1–8. During each iteration, the rectangle (black in color) holds the key taken from A (j), which is then equated with the values inside shaded rectangles to its left in the test of line 5. Shaded arrows exhibit values of array relocated one position to the right in line 6, and black arrows signify where the key moves to in line 8. (e) The ultimate sorted array. (Source: https:// mitpress.mit.edu/books/introduction-algorithms-third-edition).

Insertion Sort (A) is illustrated below: i. For j=2 to length.A ii. Key=A[j] iii. //Insetion of A[j] into a sorted sequence a[1 … j–1] iv. i = j – 1 v. while A[i]>key and i>0 vi. A[i+1]=A[i] vii. \\feeding addition of value in A[i] viii. i =i-1 ix. A[i+1]=key

1.6.2. Loop Invariants and the Precision of Insertion sort Fig.4 shown in the previous section exhibits the mechanism of this algorithm for A = (2, 5, 4, 6, 1, and 3). The index j denotes the “current card” being placed into the hand. At the start of each iteration of the for loop (indexed by j), the sub-array containing elements A [1… j – 1] institutes the presently

16

Introduction To Algorithms

sorted hand, and the leftover sub-array A[j + 1 … n] relates to the pile of cards that are still present on the table. As a matter of fact, elements A [1 … j – 1] are the original elements that were in positions 1 through j 1, however, now in sorted order. These properties of A [1 … j – 1] is formally stated as a loop invariant Loop invariants are used to understand as to why an algorithm is accurate. We should essentially demonstrate three things about a loop invariant: Initialization: It is accurate preceding the first iteration of the loop. Maintenance: If it is accurate prior to the iteration of the loop, it remains accurate before the succeeding iteration. Termination: As the loop terminates, the invariant provides us a beneficial property that helps in demonstrating that the algorithm is correct. If the first two properties hold true, then the loop invariant is true preceding every iteration of the loop. The similarity to mathematical induction must be noted here. For proving that a property holds, we prove a base case as well as an inductive step (Duda et al., 1973; Bloom & Van Reenen, 2002). In this case, demonstrating that the invariant holds prior the first iteration corresponds to the base case, whereas, demonstrating that the invariant holds among iterations correspond to the inductive step (Dehling & Philipp, 2002; Bienstock & McClosky, 2012). As we are employing the loop invariant to exhibit correctness, the third property is possibly the most significant one. Normally, we make use of the loop invariant together with the condition that caused the termination of the loop. The termination property varies from our general use of mathematical induction where we infinitely apply the inductive step; in this case, we stop the “induction” as the loop terminates. Now, let us understand how these properties hold in case of insertion sort (Bollerslev et al., 1994; Tan & Acharya, 1999). Initialization: We start by demonstrating that the loop invariant holds prior to the iteration of the first loop when j= 2. Hence, the subarray A[1 … j – 1], is just composed of the single element A[1], which as a matter of fact is the original element in A[1]. Additionally, this subarray is sorted which implies that the loop invariant holds before the first iteration of the loop (Williamson, 2002; D’Aristotile et al., 2003). Maintenance: Now, coming up to the second property: demonstrating that each of the iterations maintains the loop invariant. The for loop body works by moving A[j – 1] , A[j – 2], A[j – 3] , and likewise by single position

Fundamentals of Algorithms

17

to the right till it discovers the proper position for A[j] (lines 4–7), at this point it inserts the A[j] value (line 8). Then, the sub-array A[1 … j] contains elements that were originally in A[1 … j], however, in sorted order. The increment of j for the subsequent iteration of the ‘for loop’ conserves the loop invariant (Nelson & Foster, 1995; COnforti et al., 2010). Termination: Lastly, we observe what happens at the time the loop terminates. The condition which is responsible for termination of the ‘for loop’ is that j > A. length= n. since each loop iteration rises j by 1, we should have j = n + 1 at that time. Replacing n + 1 for j in the language of loop invariant, we have the subarray A[1... n] consisting of the elements that were originally in A[1... n], however, in sorted order. Seeing that the sub-array A [1... n] is the perfect array, we settle that the complete array is sorted. Therefore, the algorithm is correct (Fourier, 1890; 1973; Bazzi et al., 2017).

1.6.3. Pseudocode Conventions In our pseudocode, we are using the conventions stated below: i.

ii.

iii.

Indentation denotes structure of the block. For instance, the body of for loop that starts on line 1 comprises of lines from 2–8, whereas the body of the while loop that starts on line 5 comprises of lines from 6–7, however, not line 8. This style of indentation style is also applicable to if-else statements. Making use of indentation in place of conventional indicators of block structure like end or begin statements, significantly decreases clutter at the same time preserving, or even increasing, clarity (Verma, 1994; 1997; Faenza & Sanità, 2015). The looping constructs like for, while, and the conditional construct such as if-else, repeat-until are analogous to those used in C++, C, Python, Java, and Pascal [4]. Here we have assumed that the loop counter maintains its value after leaving the loop, in contrast to some situations that encounter in Java, C++, and Pascal. Hence, soon after a for loop, the counter value of loop is the value that first topped the for loop bound. We can use this feature in our correctness argument for insertion sort (Blelloch, 1989; 1996; Xiaodong & Qingxiang, 1996). In case of an if-else statement, else is indented at the same level as its corresponding if. Even though we exclude the keyword then, we sometimes refer to the part executed when the test succeeding if statement is true as a then clause. In case of multiway tests, we

Introduction To Algorithms

18

iv. v.

vi.

vii.

make use of else-if for tests afterward the first one. Majority of block-structured languages have comparable constructs, although the precise syntax may differ. In Python, there are no repeat-until loops; moreover for loop in it operate in a different manner from the for loops discussed here (Blelloch et al., 1994; 1995; 1997; 1999). The symbol “//” implies that the remainder of the line is a remark (comment). Multiple assignments of the type i= j= e assigns the value of expression e to both variables i and j; hence, it must be treated as equal to the assignment j= e proceeded by the assignment i= j. Variables (like the key, i, and j) are local to any specified procedure. Global variables must not be used without explicit indication (Blelloch & Greiner, 1996; Blelloch & Mags, 1996; 2010). We access elements of an array by stating the array name succeeded by the index written in square brackets. For instance, A[i] denotes the i-th element of the array A. The “…“ notation signifies a range of values inside an array. Therefore, A [1…j] denotes the subarray of A containing the j elements A[1], A[2], …, A[j].

1.7. ANALYZING ALGORITHMS While analyzing an algorithm, we have to predict the resources needed by the algorithm. Sometimes, resources like communication bandwidth, memory, or computer hardware are of main concern for us, but most of the time, we wish to measure the computational time (Frigo et al., 1998; Blumofe & Leiserson, 1999). Normally, by analysis of various contestant algorithms for a problem, identification of the one which is most efficient can be carried out. Analysis such as this may signify more than a single viable candidate; however, we can frequently discard various inferior algorithms during the process (Blum et al., 1973; Blelloch & Gibbons, 2004). Before we can run the analysis of an algorithm, we ought to have an implementation technology model that we will employ. This model must also include a model for the resources of that technology as well as their costs. Here, we will consider a generic one-processor as our implementation technology, that is, random-access machine (RAM) model of computation. Moreover, we will be implementing our algorithms as computer programs.

Fundamentals of Algorithms

19

In the RAM model, each of the instructions is executed one after another, with no concomitant operations (Brent, 1974; Buhler et al., 1993; Brassard & Bratley, 1996). We must accurately define the instructions of the RAM model as well as their costs. However, this task will not only be tedious but also result in little insight into the design of the algorithm and its analysis. Still, care should be taken as to not abuse the RAM model. For instance, in case if a RAM contains an instruction that sort, we can sort in merely one instruction. However, such a RAM is not as realistic as real computers do not contain such instructions (Chen & Davis, 1990; Prechelt, 1993; Sheffler, 1993). Hence, we will be focusing on the design of real computers. The RAM model includes instructions that are generally found in real computers: data movement (copy, load, store,), arithmetic (like subtraction, addition, division, multiplication, remainder, ceiling, floor, etc.), and control (subroutine call and return unconditional and conditional branch). Each of these instructions needs a constant amount of time (Hsu et al., 1992; 1993). The RAM model’s data types for storing real numbers are floating point and integer. Even though, normally we are not very concerned with precision, however, in a few applications precision is vital. Another assumption we make is a limit on the size of each data word. For instance, while working with inputs of size n, normally, we assume that representation of integers is done by c lg n bits for any constant c ≥ 1. We need c ≥ 1 in order that each word can have the value of n, permitting us to catalog the individual input elements. Additionally, we restrict c to be a constant so that the size of the word does not grow randomly. (In case of arbitrary growth of the size of the word, we can store a large quantity of data in one word and carry out an operation on it all in constant time which is obviously an unrealistic scenario). Real computers hold instructions which represent a gray area in the RAM model. For instance, in general case, exponentiation is not a constant time instruction as it takes more than a few instructions to compute xy when both x and y are real numbers (Little t al., 1989; Narayanan & Davis, 1992). However, in limited situations, exponentiation can be a constant time operation. Various computers contain a “shift left” instruction (Martínez et al., 2004; Bengtsson, 2006). This instruction shifts the integer bits in constant time by k positions to the left. In the majority of the computers, shifting the integer bits by one position to the left is the same as multiplication by a factor of 2, in other words, we can say that shifting the bits to the left by k

20

Introduction To Algorithms

positions is equal to multiplication by 2k. Consequently, such computers have the ability to compute 2k in one constant time instruction through shifting the integer 1 to the left by k positions, provided that k is no more than the total number of bits in a computer word (Hung, 1988; Chen & Davis, 1991). In the RAM model, we do not try to model the hierarchy of memory which is common in modern computers. To be clearer, we do not model virtual memory or caches. A number of computational models make efforts to account for effects of memory-hierarchy, which are occasionally important in real programs on real machines. As compared to RAM Models, such models that include the memory hierarchy are often more complicated, and hence they are often difficult to deal with. Furthermore, analyses of RAMmodel are generally brilliant indicators of performance on real machines (Bae & Takaoka, 2005; Lin & Lee, 2007). Analysis of even a straightforward algorithm in the RAM model can prove to be challenging. Several mathematical tools are needed such as algebraic dexterity, combinatorics, probability theory, and the capability to recognize the most important terms in a formula. Since for each possible input, the behavior of an algorithm may vary, we require a method to summarize that behavior in straightforward and easily understood formulas (Agrawal et al., 2006; 2007; Brodal & Jørgensen, 2007). Although we normally choose just one model of machine for analysis of any given algorithm, we still have several choices to decide how to express our analysis. The preferred way is the one that is simple to write and manipulate and represents the significant characteristics of the resource requirements of an algorithm as well as suppresses tedious details.

1.7.1. Order of Growth We can use few simplifying abstractions to make our analysis of the INSERTION-SORT easy. As a first step, the actual cost of each statement can be ignored by employing the constants ci for representing these costs. After that, we will find that even these constants provide more detail to us that we require in actual. In the previous sections, the worst case running time was expressed as an2 = b n+ C for some constants a, b, and c which is dependent on the statement costs ci. Hence, we can ignore the abstract costs ci in addition to ignoring the actual statement costs.

Another simplifying abstraction that can be made is the order or rate of growth of the running time which in actuality interests us. Thus, we consider just the leading term of a formula (that is, an2), as for large values of n, the

Fundamentals of Algorithms

21

lower-order terms are comparatively unimportant. Further, we also ignore the constant coefficient of the leading term as they are less important as compared to the rate of growth for determination of computational efficiency for large inputs. In case of insertion sort, after ignoring the constant coefficient of the leading term and the lower-order terms, we are only left with the n2 factor from the leading term.

1.7.2. Analysis of Insertion Sort The time required by the Insertion-Sort procedure mainly depends on the input: of course, sorting a thousand numbers will take more time than sorting three numbers. Additionally, for sorting two input sequences of the same size, Insertion-Sort can take a different amount of time depending on how much they are already sorted. As a general rule, the time required by an algorithm increases with increasing the size of the input, hence it is conventional to describe the program running time in terms of a function of the input size. In order to do so, we must define the terms “size of input” and “runtime” more cautiously. The best conception for the size of input is dependent on the problem under investigation. For numerous problems like computing or sorting discrete Fourier transforms, the most usual measure is the number of items in the input—for instance, the array size n for sorting. For several other problems like multiplication of two integers, the most optimum measure of input size is the total number of bits that are required for representation of input in common binary notation. Occasionally, it is more suitable to describe the input size with two numbers instead of one. For example, if the input of an algorithm is a graph, we can describe the size of the input by the numbers of edges and vertices in the graph. We will point out which measure of input size is being utilized with each of our studied problems. On a particular input, the running time of an algorithm is the number of executed steps or primitive operations. It is feasible to define the concept of step in order that it is as machine-independent as possible. For now, let us assume the following view. For the execution of each line of our pseudo code, a constant sum of time is needed. Although, the time required for execution of one line may differ from the time taken for execution of another line, nevertheless, we will suppose that each execution of the i-th line takes a constant amount of time equal to ci. This perspective is in keeping with the model of RAM; moreover, it also reveals the how the implementation of pseudo takes place on most of the real computers. The

Introduction To Algorithms

22

following demonstration deals with the evolution of the expression used for the INSERTION-SORT running time. The evolution of the expression takes place from a complex formula which utilizes all the statement costs (i.e., ci) to a simpler notation which is brief and more conveniently manipulated. This simple notation makes it convenient to determine the efficiency of a particular algorithm with respect to others. Initially, the demonstration of the INSERTION-SORT procedure with respect to each statement time “cost” and the number of execution attempts is presented below. For each j = 2, 3, 4… . n, where n= length. A, we let tj represent the number of attempts the while loop experiment in line 5 is carried out for that particular value of j. When a ‘while’ or ‘for’ exits in a customary way (i.e., because of the test in a looping header), the execution of the test is carried out one time further than the loop body. It is assumed that comments are non-executable statements, which do not consume any time. Sr. No. i ii

Cost c1 c2

Times n n–1

0

n–1

iv

Insertion Sort (A) For j = 2 to length. A Key =A[j] Insertion of A[j] into a sorted sequence a[1 … j – 1] i=j–1

c4

n–1

v

While A[i]>key and i>0

c5

vi

A[i + 1]=A[i]

c6

vii

Feeding addition of value 0 in A[i]

viii

i=i–1

c7

ix

A[i + 1]=key

c8

iii

n–1

n–1

There are some intricacies to the above-mentioned phenomenon. Computational steps specified in English are typically the variants of a procedure which sometimes requires more time than the anticipated (fixed) amount of time. For instance, “sorting of the points by x-coordinates” typically takes more time than the allotted amount of time. It can also be noted that a statement which calls a subroutine consumes constant time, although the subroutine may take more time. Therefore, we distinguish the calling process for this type of phenomena. The total running time of an algorithm is the addition of running times for each of the executed statement.

Fundamentals of Algorithms

23

Any statement that undergoes ci steps for execution and in total executes n times will add ci n to the whole running time. For computation of the total running time of Insertion-Sort on an input containing n values, denoted by T(n), we add the products of the times and cost columns, getting n

n

n

T (n) = c1n + c2 (n − 1) + c4 (n − 1) + c5 ∑ t j + c6 ∑ (t j − 1) + c7 ∑ (t j − 1) + c8 (n − 1) =j 2=j 2

=j 2

Even in the case of inputs of a known size, the running time of an algorithm might depend on which input of that size is known. For instance, in INSERTION-SORT, the most excellent case scenario is when the array is already sorted. For each of values of j = 2, 3.... n, we get that A[i]≤ key in line 5 when the initial value of i is j – 1. Hence, tj=1 for j = 2, 3… n and the running time of best-case is

T (n) = c1n + c2 (n − 1) + c4 (n − 1) + c5 (n − 1) + c8 (n − 1) = (c1 + c2 + c4 + c5 + c8 )n − (c2 + c4 + c5 + c8 ). This running time can be expressed as an + b for constants a and b. both a and b are dependent on the statement costs ci; therefore, it is a linear function of n. In the case where the array is sorted in reverse order—that is to say, in decreasing order—it results in the worst-case scenario. Each element A[j] then must be compared with each element in the whole sorted subarray A[1 … j – 1] , and thus tj = j for j = 2, 3, … ., n. Keeping in mind that n

= ∑j j =2

n(n + 1) −1 2

and n

n(n − 1) 2

∑ ( j − 1) = j =2

The running time of INSERTION-SORT in worst case scenario is:  n(n + 1)   n(n − 1)   n(n − 1)  T (n) = c1n + c2 (n − 1) + c4 (n − 1) + c5  − 1 + c6   + c7   + c8 (n − 1)  2   2   2  c c c c c c    =  5 + 6 + 7  n 2 +  c1 + c2 + c4 + 5 − 6 − 7 + c8  n − (c2 + c4 + c5 + c8 ). 2 2 2 2 2 2  

The running time of this worst-case can be expressed as an2 = bn + c for constants a, b, and c. These constants depend on the statement costs ci; therefore, making it a quadratic function of n.

Introduction To Algorithms

24

Normally, the running time of an algorithm in case of Insertion sort is fixed for any given input, even though there are also some “randomized” algorithms, the behavior of whom vary even for a fixed input.

1.7.3. Analysis of Worst-Case and Average-Case For the analysis of insertion sort, we have discussed both cases; the best case where the input array is already sorted, and secondly, the worst case, where the input array is sorted in a reverse manner. Our main concentration is only on worst-case running time, which is the longest running time for any input having size n. the three main reasons behind this orientation are stated below: i.

Through the worst-case running time of an algorithm, we get an upper bound on any inputs running time. Knowing the upper bound gives an assurance that the algorithm will by no means take any longer. Hence, it eliminates the requirement of making an educated guess by us, and we can assume that it will never get much worse. ii. The worst case takes place quite frequently for some algorithms. For instance, while searching any database to find any specific information, we will often encounter the worst case of the searching algorithm when our required information is not there in the database. iii. Most of the times, the “average case” is as bad as the worst case. Let’s assume that we select n numbers randomly and then apply insertion sort. What amount of time is required for determining as where to insert element A[j] in sub-array A[1 … j – 1]? Generally, half of the elements present in A[1 … j – 1] is greater than A[j], and half are less. Hence, on average, we will be checking half of the sub-array A[1 … j – 1], and subsequently, tj is about j = 2. The consequential running time of average-case turns out to be a quadratic function of the size of the input, just like the running time of the worst-case. The capacity of average-case analysis is constrained as it might not be clear what makes up an “average” input for a specific problem. Mostly, we like to assume that all inputs of a specified size are equally likely. In reality, this assumption may possibly be violated, however, at times, we can use employ a randomized algorithm, which makes random choices to permit a probabilistic analysis and give in an anticipated running time.

Fundamentals of Algorithms

25

REFERENCES 1.

2.

3.

4.

5.

6. 7. 8. 9. 10.

11.

12.

13.

Abramowitz, M., & Stegun, I. A., (1965). Handbook of mathematical functions with formulas, graphs, and mathematical table (Vol. 2172, pp. 101-155). New York: Dover. Acar, U. A., (2009, January). Self-adjusting computation:(an overview). In Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation (pp. 1–6). ACM. Aggarwal, A., & Vitter, J., (1988). The input/output complexity of sorting and related problems. Communications of the ACM, 31(9), 1116–1127. Agrawal, K., He, Y., & Leiserson, C. E., (2007, March). Adaptive work stealing with parallelism feedback. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming (pp. 112–120). ACM. Agrawal, K., He, Y., Hsu, W. J., & Leiserson, C. E., (2006, March). Adaptive scheduling with parallelism feedback. In Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming (pp. 100–109). ACM. Agrawal, M., Kayal, N., & Saxena, N., (2004). PRIMES is in P. Annals of mathematics, 781–793. Aho, A. V., & Hopcroft, J. E. (1974). The design and analysis of computer algorithms (Vo. 1, pp. 44-53). Pearson Education India. Aho, A. V., Hopcroft, J. E., & Ullman, J. D., (1983). Data structures and algorithms. Ahuja, R. K., & Orlin, J. B., (1989). A fast and simple algorithm for the maximum flow problem. Operations Research, 37(5), 748–759. Ahuja, R. K., Magnanti, T. L., & Orlin, J. B., (1993). Network flows: theory, algorithms, and applications (Vol. 1, p. 846). Englewood Cliffs, NJ: Prentice Hall. Ahuja, R. K., Mehlhorn, K., Orlin, J., & Tarjan, R. E., (1990). Faster algorithms for the shortest path problem. Journal of the ACM (JACM), 37(2), 213–223. Ahuja, R. K., Orlin, J. B., & Tarjan, R. E., (1989). Improved time bounds for the maximum flow problem. SIAM Journal on Computing, 18(5), 939–954. Ajtai, M., Megiddo, N., & Waarts, O., (2001). Improved algorithms

26

14. 15.

16.

17.

18.

19.

20.

21.

22. 23.

24.

25.

Introduction To Algorithms

and analysis for secretary problems and generalizations. SIAM Journal on Discrete Mathematics, 14(1), 1–27. Aki, S. G., (1989). The design and analysis of parallel algorithms. Vo. 1, pp. 35-55. United States. Akra, M., & Bazzi, L., (1998). On the solution of linear recurrence equations. Computational Optimization and Applications, 10(2), 195– 210. Alon, N., (1990). Generating pseudo-random permutations and maximum flow algorithms. Information Processing Letters, 35(4), 201–204. Amir, A., Aumann, Y., Benson, G., Levy, A., Lipsky, O., Porat, E., & Vishne, U., (2006, January). Pattern matching with address errors: rearrangement distances. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on a Discrete Algorithm (pp. 1221–1229). Society for Industrial and Applied Mathematics. Amtoft, T., Consel, C., Danvy, O., & Malmkjær, K., (2002). The abstraction and instantiation of string-matching programs. The Essence of Computation (pp. 332–357). Springer, Berlin, Heidelberg. Andersson, A., (1995, October). Sublogarithmic searching without multiplications. In Foundations of Computer Science, 1995 Proceedings, 36th Annual Symposium on (pp. 655–663). IEEE. Andersson, A., (1996, October). Faster deterministic sorting and searching in linear space. In Foundations of Computer Science, 1996 Proceedings, 37th Annual Symposium on (pp. 135–141). IEEE. Andersson, A. A., & Thorup, M., (2000, May). Tight (er) worst-case bounds on dynamic searching and priority queues. In Proceedings of the thirty-second annual ACM symposium on Theory of computing (pp. 335–342). ACM. Arge, L. (2001, August). External memory data structures. In European Symposium on Algorithms (pp. 1–29). Springer, Berlin, Heidelberg. Arora, N. S., Blumofe, R. D., & Plaxton, C. G., (2001). Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2), 115–144. Arora, S., (1994). Probabilistic checking of proofs and hardness of approximation problems (Doctoral dissertation, Princeton University, Department of Computer Science). Arora, S., (1998, May). The approximability of NP-hard problems.

Fundamentals of Algorithms

26.

27.

28.

29.

30.

31.

32. 33. 34.

35.

36.

27

In Proceedings of the thirtieth annual ACM symposium on Theory of computing (pp. 337–348). ACM. Arora, S., & Lund, C., (1996, August). The hardness of approximations. In Approximation algorithms for NP-hard problems (pp. 399–446). PWS Publishing Co. Arora, S., Lund, C., Motwani, R., Sudan, M., & Szegedy, M., (1998). Proof verification and the hardness of approximation problems. Journal of the ACM (JACM), 45(3), 501–555. Aslam, F., Fennell, L., Schindelhauer, C., Thiemann, P., Ernst, G., Haussmann, E., & Uzmi, Z. A. (2010, June). Optimized java binary and virtual machine for tiny motes. In International Conference on Distributed Computing in Sensor Systems (Vol. 6131, pp. 15-30). Springer, Berlin, Heidelberg. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., MarchettiSpaccamela, A., & Protasi, M., (2012). Complexity and approximation: Combinatorial optimization problems and their approximability properties (Vol. 1, pp. 57-78). Springer Science & Business Media. Avidan, S., & Shamir, A. (2007, August). Seam carving for contentaware image resizing. In ACM Transactions on Graphics (TOG) (Vol. 26, No. 3, p. 10). ACM. Babaioff, M., Immorlica, N., Kempe, D., & Kleinberg, R., (2007). A knapsack secretary problem with applications. In Approximation, randomization, and combinatorial optimization. Algorithms and techniques (pp. 16–28). Springer, Berlin, Heidelberg. Bach, E., (1990). Number-theoretic algorithms. Annual Review of Computer Science, 4(1), 119–172. Bach, E., & Shallit, J. O., (1996). Algorithmic Number Theory: Efficient Algorithms (Vol. 1, pp. 35-42). MIT press. Bae, S. E., & Takaoka, T., (2005, August). Improved algorithms for the K-maximum subarray problem for small K. In International Computing and Combinatorics Conference (pp. 621–631). Springer, Berlin, Heidelberg. Bailey, D. H., Lee, K., & Simon, H. D., (1991). Using Strassen’s algorithm to accelerate the solution of linear systems. The Journal of Supercomputing, 4(4), 357–371. Bakker, M., Riezebos, J., & Teunter, R. H., (2012). Review of inventory systems with deterioration since 2001. European Journal of Operational

28

37.

38.

39. 40.

41.

42.

43.

44.

45.

46.

47.

Introduction To Algorithms

Research, 221(2), 275–284. Baswana, S., Hariharan, R., & Sen, S., (2002, May). Improved decremental algorithms for maintaining transitive closure and allpairs shortest paths. In Proceedings of the thirty-fourth annual ACM symposium on Theory of computing (pp. 117–123). ACM. Bateni, M., Hajiaghayi, M., & Zadimoghaddam, M., (2010). Submodular secretary problem and extensions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (pp. 39–52). Springer, Berlin, Heidelberg. Bayer, R., (1972). Symmetric binary B-trees: Data structure and maintenance algorithms. Acta Informatica, 1(4), 290–306. Bazzi, A., Fiorini, S., Huang, S., & Svensson, O., (2017, January). Small extended formulation for knapsack cover inequalities from monotone circuits. In Proceedings of the Twenty-Eighth Annual ACMSIAM Symposium on Discrete Algorithms (Vol. 1, pp. 2326–2341). Society for Industrial and Applied Mathematics. Beauchemin, P., Brassard, G., Crépeau, C., Goutier, C., & Pomerance, C., (1988). The generation of random numbers that are probably prime. Journal of Cryptology, 1(1), 53–64. Bellare, M., & Sudan, M., (1994, May). Improved non-approximability results. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing (pp. 184–193). ACM. Bender, M. A., Demaine, E. D., & Farach-Colton, M., (2000). Cacheoblivious B-trees. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on (pp. 399–409). IEEE. Bengtsson, F., (2006, August). Computing maximum-scoring segments in almost linear time. In International Computing and Combinatorics Conference (pp. 255–264). Springer, Berlin, Heidelberg. Ben-Or, M., (1983, December). Lower bounds for algebraic computation trees. In Proceedings of the fifteenth annual ACM symposium on Theory of computing (pp. 80–86). ACM. Bent, S. W., & John, J. W., (1985, December). Finding the median requires 2n comparisons. In Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing. (pp. 213–216). ACM. Bentley, J. L., Haken, D., & Saxe, J. B., (1980). A general method for solving divide-and-conquer recurrences. ACM SIGACT News, 12(3), 36–44.

Fundamentals of Algorithms

29

48. Berger, J., & Barkaoui, M., (2004). A parallel hybrid genetic algorithm for the vehicle routing problem with time windows. Computers & operations research, 31(12), 2037–2053. 49. Berry, M. V., & Howls, C. J., (2012). Integrals with coalescing saddles. Chapter 36, 775–793. 50. Bienstock, D. (2008). Approximate formulations for 0–1 knapsack sets. Operations Research Letters, 36(3), 317–320. 51. Bienstock, D., & McClosky, B., (2012). Tightening simple mixedinteger sets with guaranteed bounds. Mathematical Programming, 133(1), 337–363. 52. Bishop, C. M., (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). New York, New York: Springer Science and Business Media. 53. Blasgen, M. W., Casey, R. G., & Eswaran, K. P., (1977). An encoding method for multifield sorting and indexing. Communications of the ACM, 20(11), 874–878. 54. Blelloch, G. E. (1989). Scan primitives and parallel vector models (Doctoral dissertation, Massachusetts Institute of Technology. Laboratory for Computer Science), Vol. 1, pp. 25-37. 55. Blelloch, G. E. (1990). Vector models for data-parallel computing (Vol. 75, pp. 15-26). Cambridge: MIT Press. Institute of Technology. Laboratory for Computer Science). 56. Blelloch, G. E., (1996). Programming parallel algorithms. Communications of the ACM, 39(3), 85–97. 57. Blelloch, G. E., & Gibbons, P. B., (2004, June). Effectively sharing a cache among threads. In Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures (pp. 235– 244). ACM. 58. Blelloch, G. E., & Greiner, J., (1996, June). A provable time and space efficient implementation of NESL. In ACM SIGPLAN Notices (Vol. 31, No. 6, pp. 213–225). ACM. 59. Blelloch, G. E., & Maggs, B. M., (1996). Parallel Algorithms. ACM Computing Surveys (CSUR), 28(1), 51–54. 60. Blelloch, G. E., & Maggs, B. M., (2010, January). Parallel Algorithms. In Algorithms and theory of computation handbook (pp. 25–25). Chapman & Hall/CRC. 61. Blelloch, G. E., Gibbons, P. B., & Matias, Y., (1995, July). Provably

30

62.

63.

64.

65. 66.

67. 68.

69.

70. 71. 72. 73. 74.

Introduction To Algorithms

efficient scheduling for languages with fine-grained parallelism. In Proceedings of the Seventh Annual ACM Symposium on Parallel Algorithms and Architectures (pp. 1–12). ACM. Blelloch, G. E., Gibbons, P. B., & Matias, Y., (1999). Provably efficient scheduling for languages with fine-grained parallelism. Journal of the ACM (JACM), 46(2), 281–321. Blelloch, G. E., Gibbons, P. B., Matias, Y., & Narlikar, G. J., (1997, June). Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (pp. 12–23). ACM. Blelloch, G. E., Hardwick, J. C., Sipelstein, J., Zagha, M., & Chatterjee, S., (1994). Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21(1), 4–14. Block H, D., (1961). The Perceptron: A Model of Brain Functioning. 34 (1), 123–135. Block, H. D., Knight Jr, B. W., & Rosenblatt, F., (1962). Analysis of a four-layer series-coupled perceptron. II. Reviews of Modern Physics, 34(1), 135. Bloom, N., & Van Reenen, J., (2002). Patents, real options, and firm performance. The Economic Journal, 112(478). Blum, M., Floyd, R. W., Pratt, V. R., Rivest, R. L., & Tarjan, R. E., (1973). Two papers on the selection problem: Time Bounds for Selection [by Manual Blum, Robert W. Floyd, Vaughan Pratt, Ronald L. Rivest, and Robert E. Tarjan] and Expected Time Bounds for Selection [by Robert W. Floyd and Ronald L. Rivest]. Blumofe, R. D., & Leiserson, C. E., (1999). Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5), 720–748. Bollerslev, T., Engle, R. F., & Nelson, D. B., (1994). ARCH models. Handbook of Econometrics, 4, 2959–3038. Brassard, G., & Bratley, P., (1996). Fundamentals of Algorithmics (Vol. 33, pp. 342-356). Englewood Cliffs: Prentice Hall. Brent, R. P., (1974). The parallel evaluation of general arithmetic expressions. Journal of the ACM (JACM), 21(2), 201–206. Brent, R. P. (1980). An improved Monte Carlo factorization algorithm. BIT Numerical Mathematics, 20(2), 176–184. Brereton, R. G., (2015). Pattern recognition in chemometrics.

Fundamentals of Algorithms

75.

76.

77.

78.

79.

80.

81. 82. 83.

84. 85.

86. 87.

31

Chemometrics and Intelligent Laboratory Systems, 149, 90–96. Brodal, G. S., & Jørgensen, A. G., (2007, August). A linear time algorithm for the k maximal sums problem. In International Symposium on Mathematical Foundations of Computer Science (pp. 442–453). Springer, Berlin, Heidelberg. Brodnik, A., Miltersen, P. B., & Munro, J. I., (1997, August). Transdichotomous algorithms without multiplication—some upper and lower bounds. In Workshop on Algorithms and Data Structures (pp. 426–439). Springer, Berlin, Heidelberg. Buchbinder, N., Jain, K., & Singh, M., (2010, June). Secretary problems via linear programming. In International Conference on Integer Programming and Combinatorial Optimization (pp. 163–176). Springer, Berlin, Heidelberg. Buhler, J. P., Lenstra, H. W., & Pomerance, C., (1993). Factoring integers with the number field sieve. The Development of the Number Field Sieve (pp. 50–94). Springer, Berlin, Heidelberg. Burke, E. K., Hyde, M. R., & Kendall, G. (2006). Evolving bin packing heuristics with genetic programming. In Parallel Problem Solving from Nature-PPSN IX (pp. 860–869). Springer, Berlin, Heidelberg. Campbell, D. T., (1976). On the conflicts between biological and social evolution and between psychology and moral tradition. Zygon®, 11(3), 167–208. Carling, A., (1992). Introducing Neural Networks. Wilmslow, UK: Sigma Press. Caudill, M., & Butler, C., (1993). Understanding Neural Networks: Computer Explorations (No. 006.3 C3). Chali, Y. S. R., (2009). Complex Question Answering: Unsupervised Learning Approaches and Experiments. Journal of Artificial Intelligent Research, 1–47. Chan, T. H., (2000). HOMFLY polynomials of some generalized Hopf links. Journal of Knot Theory and Its Ramifications, 9(07), 865–883. Chen, L. T., & Davis, L. S., (1990, April). A parallel algorithm for list ranking image curves in O (log N) time. In DARPA Image Understanding Workshop (pp. 805–815). Chen, L. T., & Davis, L. S., (1991). Parallel algorithms for testing if a point is inside a closed curve. Pattern recognition letters, 12(2), 73–77. Chen, Y. S., Qin, Y. S., Xiang, Y. G., Zhong, J. X., & Jiao, X. L.,

32

88.

89. 90.

91.

92. 93.

94.

95. 96.

97.

98.

99.

Introduction To Algorithms

(2011). Intrusion detection system based on immune algorithm and support vector machine in a wireless sensor network. In Information and Automation (pp. 372–376). Springer, Berlin, Heidelberg. Cheriyan, J., & Hagerup, T., (1989, October). A randomized maximumflow algorithm. In Foundations of Computer Science, 1989., 30th Annual Symposium on (pp. 118–123). IEEE. Cheriyan, J., & Hagerup, T., (1995). A randomized maximum-flow algorithm. SIAM Journal on Computing, 24(2), 203–226. Cheriyan, J., Hagerup, T., & Mehlhorn, K., (1990, July). Can a maximum flow be computed in o (nm) time?. In International Colloquium on Automata, Languages, and Programming (pp. 235–248). Springer, Berlin, Heidelberg. Cheriyan, J., Hagerup, T., & Mehlhorn, K., (1996). An o(n^3)-Time Maximum-Flow Algorithm. SIAM Journal on Computing, 25(6), 1144–1170. Ciurea, E., & Ciupala, L., (2001). Algorithms for minimum flows. Computer Science Journal of Moldova, 9(3), 27. Conforti, M., Wolsey, L. A., & Zambelli, G., (2010). Projecting an extended formulation for mixed-integer covers on bipartite graphs. Mathematics of operations research, 35(3), 603–623. Courcoubetis, C., Vardi, M., Wolper, P., & Yannakakis, M., (1992). Memory-efficient algorithms for the verification of temporal properties. Formal methods in system design, 1(2–3), 275–288. Michie, D. (1968). “Memo” functions and machine learning. Nature, 218(5136), 19. Damgård, I., Landrock, P., & Pomerance, C., (1993). Average case error estimates for the strong probable prime test. Mathematics of Computation, 61(203), 177–194. D’Aristotile, A., Diaconis, P., & Newman, C. M., (2003). Brownian motion and the classical groups. Lecture Notes-Monograph Series, 97–116. Dehling, H., & Philipp, W., (2002). Empirical process techniques for dependent data. In Empirical process techniques for dependent data (pp. 3–113). Birkhäuser, Boston, MA. Dempster, A. P., Laird, N. M., & Rubin, D. B., (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological), 1–38.

Fundamentals of Algorithms

33

100. Dengiz, B., Altiparmak, F., & Smith, A. E., (1997). Efficient optimization of all-terminal reliable networks, using an evolutionary approach. IEEE Transactions on Reliability, 46(1), 18–26. 101. Dickerson, M., Eppstein, D., Goodrich, M. T., & Meng, J. Y., (2003, September). Confluent drawings: visualizing non-planar diagrams in a planar way. In International Symposium on Graph Drawing (pp. 1–12). Springer, Berlin, Heidelberg. 102. Didier, F., (2009). Efficient erasure decoding of Reed-Solomon codes. arXiv preprint arXiv:0901.1886. 103. Dockens, W. S., (1979). Induction/catastrophe theory: A behavioral ecological approach to cognition in human individuals. Systems Research and Behavioral Science, 24(2), 94–111. 104. Doya, K. (Ed.)., (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press. 105. Drmota, M., & Szpankowski, W., (2013). A master theorem for discrete divide and conquer recurrences. Journal of the ACM (JACM), 60(3), 16. 106. Du, W., & Atallah, M. J., (2001). Protocols for secure remote database access with approximate matching. In E-Commerce Security and Privacy (pp. 87–111). Springer, Boston, MA. 107. Duda, R. O., Hart, P. E., & Stork, D. G. (1973). Fisher’s Linear Discriminant.  Patent Classification and Scene Analysis, Copyright, Vol. 2, pp. 114-119. New York: Wiley. 108. Durbin, R., & Rumelhart, D. E., (1989). Product units: A computationally powerful and biologically plausible extension to backpropagation networks. Neural Computation, 1(1), 133–142. 109. Dutra da Silva, R., Robson, W., & Pedrini Schwartz, H., (2011). Image segmentation based on wavelet feature descriptor and dimensionality reduction applied to remote sensing. Chilean J. Stat, 2(2), 51-60. 110. Dutton, J. M., & Starbuck, W. H., (1971). Computer simulation models of human behavior: A history of an intellectual technology. IEEE Transactions on Systems, Man, and Cybernetics, 2, 128–171. 111. Elliott S. W. & Anderson, J. R., (1995). Learning and Memory. Wiley, New York, USA. 112. Eppstein, D., Goodrich, M. T., & Sun, J. Z., (2005, June). The skip quadtree: a simple dynamic data structure for multidimensional data. In Proceedings of the twenty-first annual symposium on Computational

34

Introduction To Algorithms

Geometry (pp. 296–305). ACM. 113. Eppstein, D., Goodrich, M. T., & Sun, J. Z., (2008). Skip quadtrees: Dynamic data structures for multidimensional point sets. International Journal of Computational Geometry & Applications, 18(01n02), 131– 160. 114. Faenza, Y., & Sanità, L., (2015). On the existence of compact ε-approximated formulations for knapsack in the original space. Operations Research Letters, 43(3), 339–342. 115. Farhangfar, A., Kurgan, L., & Dy, J., (2008). Impact of the imputation of missing values on classification error for discrete data. Pattern Recognition, 41(12), 3692–3705. 116. Fausett, L., (1994). Fundamentals of Neural Networks. New York: Prentice Hall. 117. Festa, P., & Resende, M. G., (2002). GRASP: An annotated bibliography. In Essays and surveys in metaheuristics (pp. 325–367). Springer, Boston, MA. 118. Fisher, D. H., (1987). Knowledge acquisition via incremental conceptual clustering. Machine learning, 2(2), 139–172. 119. Forsyth, M. E., Hochberg, M., Cook, G., Renals, S., Robinson, T., Schechtman, R., & Doherty-Sneddon, G., (1995). Semi-continuous hidden Markov models for speaker verification. In Proc. ARPA Spoken Language Technology Workshop (Vol. 1, pp. 2171–2174). Universiteit Twente, Enschede. 120. Forsyth, R. S., (1990). The strange story of the Perceptron. Artificial Intelligence Review, 4 (2), 147–155. 121. Fourier, J. B. J., (1890). from 1824, republished as Second extrait in oeuvres de Fourier, tome ii (G. Darboux, ed.). Gauthier-Villars, Paris, 38–42. 122. Fourier, J. B. J., (1973). From 1824, republished as the Second extract in Oeuvres de Fourier, Tome II, G. Darboux, ed. Gauthier-Villars, Paris 1890, see DA Kohler, Translation of a report by Fourier on his work on linear inequalities. 123. Franc, V., & Hlaváč, V., (2005, August). Simple solvers for large quadratic programming tasks. In Joint Pattern Recognition Symposium (pp. 75–84). Springer, Berlin, Heidelberg. 124. Fredman, M. L., & Willard, D. E., (1993). Surpassing the information theoretic bound with fusion trees. Journal of computer and system

Fundamentals of Algorithms

35

sciences, 47(3), 424–436. 125. Freund, Y., & Schapire, R. E., (1999). Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277–296. 126. Friedberg, R. M., (1958). A learning machine: Part, 1. IBM Journal, 2–13. 127. Friedl, K., & Sudan, M., (1995, January). Some improvements to total degree tests. In Theory of Computing and Systems, 1995. Proceedings., Third Israel Symposium on the (pp. 190–198). IEEE. 128. Frigo, M., Leiserson, C. E., & Randall, K. H., (1998). The implementation of the Cilk-5 multithreaded language. ACM Sigplan Notices, 33(5), 212–223. 129. Fu, S. S., & Lee, M. K., (2005). IT-Based Knowledge Sharing and Organizational Trust: The Development and Initial Test of a Comprehensive Model. ECIS 2005 Proceedings, 56. 130. Fukunaga, A. S., (2008). Automated discovery of local search heuristics for satisfiability testing. Evolutionary Computation, 16(1), 31–61. 131. Fussenegger, F., & Gabow, H. N., (1979). A counting approach to lower bounds for selection problems. Journal of the ACM (JACM), 26(2), 227–238. 132. Gennari, J. H., Langley, P., & Fisher, D., (1988). Models of incremental concept formation (No. UCI-ICS-TR-88–16). California University Irvine Department of Information and Computer Science. 133. Getoor, L., & Taskar, B. (Eds.)., (2007). Introduction to statistical relational learning. MIT Press. 134. Ghahramani, Z., (2008). Unsupervised learning algorithms are designed to extract structure from data. 178, pp. 1–8. IOS Press. 135. Ghahramani, & Jordan, M. I. (1997). Factorial hidden Markov models. Machine Learning, 29, 245-273. 136. Gillies, D., (1994). A rapprochement between deductive and inductive logic. Logic Journal of the IGPL, 2(2), 149-166. 137. Glasser, K. S., & Austin Barron, R. H., (1983). The d choice secretary problem. Sequential Analysis, 2(3), 177–199. 138. Gong, Y., & Haton, J. P., (1992). Non-linear vectorial interpolation for speaker recognition, in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2,(San Francisco, California, USA), pp. II173–II176, March 1992.

36

Introduction To Algorithms

139. González, L., Angulo, C., Velasco, F., & Catala, A., (2006). Dual unification of bi-class support vector machine formulations. Pattern recognition, 39(7), 1325–1332. 140. Goodacre, R., Broadhurst, D., Smilde, A. K., Kristal, B. S., Baker, J. D., Beger, R., & Ebbels, T., (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231– 241. 141. Goodrich, M. T., Atallah, M. J., & Tamassia, R., (2005, June). Indexing information for data forensics. In International Conference on Applied Cryptography and Network Security (pp. 206–221). Springer, Berlin, Heidelberg. 142. Graefe, G., (2011). Modern B-tree techniques. Foundations and Trends® in Databases, 3(4), 203–402. 143. Gregan‐Paxton, J., Hoeffler, S., & Zhao, M., (2005). When categorization is ambiguous: Factors that facilitate the use of a multiple category inference strategy. Journal of Consumer Psychology, 15(2), 127–140. 144. Gross, G. N., Lømo, T., & Sveen, O., (1969). Participation of inhibitory and excitatory interneurons in the control of hippocampal-cortical output, Per Anderson, The Interneuron. 145. Grzymala-Busse, J. W., & Hu, M., (2000, October). A comparison of several approaches to missing attribute values in data mining. In International Conference on Rough Sets and Current Trends in Computing (pp. 378–385). Springer, Berlin, Heidelberg. 146. Grzymala-Busse, J. W., Goodwin, L. K., Grzymala-Busse, W. J., & Zheng, X., (2005, August). Handling missing attribute values in preterm birth data sets. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (pp. 342–351). Springer, Berlin, Heidelberg. 147. Hall, M., Frank, E., Holmes, G., Pfahringer, BAdel’son-Vel’skii, G. M., & Landis, E. M., (1962). Lil C. Miranda, Equazioni alle derivate parziali di tipo ellittico, Springer, Berlin, 1955. 148. Harth, A., Umbrich, J., Hogan, A., & Decker, S., (2007). YARS2: A federated repository for querying graph structured data from the web. In The Semantic Web (pp. 211–224). Springer, Berlin, Heidelberg. 149. Herr, D. G., (1980). On the history of the use of geometry in the general linear model. The American Statistician, 34(1), 43–47. 150. Hsu, T. S., Ramachandran, V., & Dean, N., (1992). Implementation of

Fundamentals of Algorithms

37

Parallel Graph Algorithms on the MasPar. In Computational Support for Discrete Mathematics (pp. 165–198). 151. Hsu, T. S., Ramachandran, V., & Dean, N., (1993). Implementation of parallel graph algorithms on a massively parallel SIMD computer with virtual processing. University of Texas at Austin, Department of Computer Sciences. 152. Hung, Y., (1988). Processing geometric representations on SIMD computers. Maryland Univ., College Park, MD (USA). 153. Icking, C., Klein, R., & Ottmann, T., (1987, June). Priority search trees in secondary memory. In International Workshop on GraphTheoretic Concepts in Computer Science (pp. 84–93). Springer, Berlin, Heidelberg. 154. Igarashi, Y., Sado, K., & Saga, K., (1987). Fast parallel sorts on a practical sized mesh-connected processor array. IEICE Transactions (1976–1990), 70(1), 56–64. 155. Immorlica, N., Kleinberg, R., & Mahdian, M., (2006, December). Secretary problems with competing employers. In International Workshop on Internet and Network Economics (pp. 389–400). Springer, Berlin, Heidelberg. 156. John, J. W., (1988). A new lower bound for the set partitioning problem. SIAM Journal on Computing, 17(4), 640–647. 157. Kim, S. H., & Pomerance, C., (1989). The probability that a random probable prime is composite. Mathematics of Computation, 53(188), 721–741. 158. King, D. J., (1995). Functional binomial queues. In Functional Programming, Glasgow 1994 (pp. 141–150). Springer, London. 159. Kirkpatrick, D. G., (1981). A unified lower bound for selection and set partitioning problems. Journal of the ACM (JACM), 28(1), 150–165. 160. Kleinberg, R., (2005, January). A multiple-choice secretary algorithm with applications to online auctions. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete Algorithms (pp. 630– 631). Society for Industrial and Applied Mathematics. 161. Kwong, Y. S., & Wood, D., (1982). A new method for concurrency in B-trees. IEEE Transactions on Software Engineering, 3, 211–222. 162. Leighton, T., (1996). Notes on better master theorems for divideand-conquer recurrences. Manuscript. Massachusetts Institute of Technology.

38

Introduction To Algorithms

163. Lin, T. C., & Lee, D. T., (2007). Randomized algorithm for the sum selection problem. Theoretical Computer Science, 377(1–3), 151–156. 164. Little, J. J., Blelloch, G. E., & Cass, T. A., (1989). Algorithmic techniques for computer vision on a fine-grained parallel machine. IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(3), 244–257. 165. Martínez, C., Panario, D., & Viola, A., (2004, January). Adaptive sampling for quickselect. In Proceedings of the fifteenth annual ACMSIAM symposium on Discrete Algorithms (pp. 447–455). Society for Industrial and Applied Mathematics. 166. Maurer, S. B., (1985). The Lessons of Williamstown. In New Directions in Two-Year College Mathematics (pp. 255–270). Springer, New York, NY. 167. Meijer, H., & Akl, S. G., (1987). Optimal computation of prefix sums on a binary tree of processors. International Journal of Parallel Programming, 16(2), 127–136. 168. Meijer, H., & Akl, S. G., (1988). Bit-serial addition trees and their applications. Computing, 40(1), 9–17. 169. Monier, L., (1980). Evaluation and comparison of two efficient probabilistic primality testing algorithms. Theoretical Computer Science, 12(1), 97–108. 170. Morain, F., (1988). Implementation of the Atkin-Goldwasser-Kilian primality testing algorithm (Doctoral dissertation, INRIA). 171. Munro, J. I., & Poblete, P. V., (1982). A Lower Bound for Determining the Median. Faculty of Mathematics, University of Waterloo. 172. Narayanan, P. J., & Davis, L. S., (1992). Replicated image algorithms and their analyses on SIMD machines. International Journal of Pattern Recognition and Artificial Intelligence, 6(02n03), 335–352. 173. Nelson, D. B., & Foster, D. P. (1995). Filtering and forecasting with misspecified ARCH models II: making the right forecast with the wrong model. Journal of Econometrics, 67(2), 303-335. 174. Nievergelt, J., (1974). Binary search trees and file organization. ACM Computing Surveys (CSUR), 6(3), 195–207. 175. Park, T. G., & Oldfield, J. V., (1993). Minimum spanning tree generation with content-addressable memory. Electronics Letters, 29(11), 1037– 1039. 176. Phillips, S., & Westbrook, J., (1993, June). Online load balancing

Fundamentals of Algorithms

39

and network flow. In Proceedings of the twenty-fifth annual ACM symposium on Theory of computing (pp. 402–411). ACM. 177. Polishchuk, A., & Spielman, D. A., (1994, May). Nearly-linear size holographic proofs. In Proceedings of the twenty-sixth annual ACM symposium on Theory of computing (pp. 194–203). ACM. 178. Prechelt, L., (1993). Measurements of MasPar MP-1216A communication operations. Univ., Fak. für Informatik. 179. Price, G. B., (1973). Telescoping Sums and the Summation of Sequences. The Two-Year College Mathematics Journal, 4(2), 16–29. 180. Ramadge, P. J., & Wonham, W. M., (1989). The control of discrete event systems. Proceedings of the IEEE, 77(1), 81–98. 181. Raman, R., (1996, September). Priority queues: Small, monotone, and Trans-dichotomous. In European Symposium on Algorithms (pp. 121– 137). Springer, Berlin, Heidelberg. 182. Regli, W. C., (1992). A survey of automated feature recognition techniques, National Science Foundation Engineering Research Program, TR 92-18, 14-14. 183. Roura, S., (2001). Improved master theorems for divide-and-conquer recurrences. Journal of the ACM (JACM), 48(2), 170–205. 184. Salisbury, M., Anderson, C., Lischinski, D., & Salesin, D. H., (1996, August). Scale-dependent reproduction of pen-and-ink illustrations. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 461–468). ACM. 185. Sardelis, D. A., & Valahas, T. M., (1999). Decision making: A golden rule. The American Mathematical Monthly, 106(3), 215–226. 186. Schönhage, A., Paterson, M., & Pippenger, N., (1976). Finding the median. Journal of Computer and System Sciences, 13(2), 184–199. 187. Selim, G., & Meijer, H., (1987). On the Bit Complexity of Parallel Computations.’ In Parallel Processing and Applications: Proceedings of the International Conference on Parallel Processing and Applications, L’Aquila, Italy, 23–25 September 1987 (p. 101). North Holland. 188. Sheffler, T. J., (1993, August). Implementing the multiprefix operation on parallel and vector computers. In Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures (pp. 377– 386). ACM. 189. Shen, Z., & Marston, C. M., (1995). A study of a dice problem. Applied Mathematics and Computation, 73(2–3), 231–247.

40

Introduction To Algorithms

190. Sion, R., Atallah, M., & Prabhakar, S., (2002, April). Power: A metric for evaluating watermarking algorithms. In Information Technology: Coding and Computing, 2002. Proceedings. International Conference on (pp. 95–99). IEEE. 191. Smith, R. S., (1986). Rolle over Lagrange—Another shot at the mean value theorem. The College Mathematics Journal, 17(5), 403–406. 192. Snyder, L., (1984). Parallel Programming and the Poker Programming Environment (No. TR-84–04–02). Washington Univ Seattle Dept of Computer Science. 193. Stock, J. H., & Watson, M. W., (2001). Vector autoregressions. Journal of Economic Perspectives, 15(4), 101–115. 194. Sudan, M., (1992). Efficient checking of polynomials and proofs and the hardness of approximation problems. Lecture Notes in Computer Science, 1001. 195. Szymanski, T. G., (1975). A special case of the maximal common subsequence problem. Technical Report TR-170, Computer Science Laboratory, Princeton University. 196. Szymanski, T. G., & Van Wyk, C. J., (1983, June). Space efficient algorithms for VLSI artwork analysis. In Proceedings of the 20th Design Automation Conference (pp. 734–739). IEEE Press. 197. Tan, Y. P., & Acharya, T., (1999, March). A robust sequential approach for the detection of defective pixels in an image sensor. In Acoustics, Speech, and Signal Processing, 1999. Proceedings., 1999 IEEE International Conference on (Vol. 4, pp. 2239–2242). IEEE. 198. Thorup, M., (1997). Faster deterministic sorting and priority queues in linear space (pp. 550–555). Max-Planck-Institut für Informatik. 199. Vanderbei, R. J., (1980). The optimal choice of a subset of a population. Mathematics of Operations Research, 5(4), 481–486. 200. Verma, R. M., (1994). A general method and a master theorem for divide-and-conquer recurrences with applications. Journal of Algorithms, 16(1), 67–79. 201. Verma, R. M., (1997). General techniques for analyzing recursive algorithms with applications. SIAM Journal on Computing, 26(2), 568–581. 202. Wallace, L., Keil, M., & Rai, A., (2004). Understanding software project risk: a cluster analysis. Information & Management, 42(1), 115–125.

Fundamentals of Algorithms

41

203. Wang, Y., (2008). Topology control for wireless sensor networks. In Wireless sensor networks and applications (pp. 113–147). Springer, Boston, MA. 204. Wilf, H. S., (1984). A Bijection in the Theory of Derangements. Mathematics Magazine, 57(1), 37–40. 205. Williamson, J., (2002). Probability logic. In Studies in Logic and Practical Reasoning (Vol. 1, pp. 397–424). Elsevier. 206. Wilson, J. G., (1991). Optimal choice and assignment of the best m of n randomly arriving items. Stochastic processes and their applications, 39(2), 325–343. 207. Winograd, S., (1970). On the algebraic complexity of functions. In Actes du Congres International des Mathématiciens (Vol. 3, pp. 283– 288). 208. Wunderlich, M. C., (1983). A performance analysis of a simple primetesting algorithm. Mathematics of computation, 40(162), 709–714. 209. Xiaodong, W., & Qingxiang, F., (1996). A frame for general divideand-conquer recurrences. Information Processing Letters, 59(1), 45– 51. 210. Yap, C., (2011, May). A real elementary approach to the master recurrence and generalizations. In International Conference on Theory and Applications of Models of Computation (pp. 14–26). Springer, Berlin, Heidelberg. 211. Zhu, X., & Wilhelm, W. E., (2006). Scheduling and lot sizing with the sequence-dependent setup: A literature review. IIE Transactions, 38(11), 987–1007.

CHAPTER

2

CLASSIFICATION OF ALGORITHMS

CONTENTS 2.1. Introduction....................................................................................... 44 2.2. Deterministic And Randomized Algorithms....................................... 45 2.3. Online vs. Offline Algorithms............................................................ 46 2.4. Exact, Approximate, Heuristic, And Operational Algorithms.............. 46 2.5. Classification According To The Main Concept................................... 47 References................................................................................................ 55

44

Introduction To Algorithms

2.1. INTRODUCTION A method or set of procedures to be adopted in a problem-solving task is known as an algorithm. The word Algorithm derived from Medieval Latin is not only restricted to computer programming. There are numerous types of algorithms for different kinds of problems. The thought that only a known number of algorithms exist and one has to memorize all of its types is a failing believe that compels a lot of future programmers to adopt lawn maintenance as a source of income to meet their expenditures (Rabin, 1977; Cook, 1983, 1987). An algorithm is made whenever a problem arises while evolving the entire program. Numerous ways to classify classes of algorithms are discussed in this chapter. There is not only one “correct” classification. The task of classifying algorithms should be regarded more as compared to the task of attributing them (Karp, 1986; Virrer & Simons, 1986). After discussing the labeling procedure of a given algorithm (e.g., an algorithm of divideand-conquer) we will study the different methods to examine algorithms in detail. Frequently the labels with which the algorithms are categorized are pretty useful and helpful in choosing the right form of analysis (Cormen & Leiserson, 1989; Stockmeyer & Meyer, 2002) (Figure 2.1).

Figure 2.1: Major types of data structure algorithms. (Source: http://www. codekul.com/blog/types-algorithms-data-structures-every-programmerknow/).

The quickness or efficiency of an algorithm is evaluated by the number of basic tasks it executes. For example, an algorithm whose input is N and it performs numerous operations. The relationship between a number of tasks executed and completion time for the task define the features of an

Classification of Algorithms

45

algorithm (Dayde, 1996; Panda et al., 1997; D’Alberto, 2000). Hence, it must be observed that every algorithm belongs to a certain class. In the increasing order of their growth algorithms are classified as: i. ii. iii. iv. v.

constant time algorithm; logarithmic algorithm; linear time algorithm; polynomial time algorithm; and exponential time algorithm;

2.2. DETERMINISTIC AND RANDOMIZED ALGORITHMS Given below is one of the main (and exclusive) differences which help us to identify whether the given algorithm is deterministic or it is randomized (Codenotti et al., 2004; Kågström & Van Loan, 1998). On a given input Deterministic algorithms yield the same results adopting the same computation stages whereas the Randomized algorithms unlike Deterministic throw coins during implementation. For each attempt on the particular input, either the order of implementation or the outcome of the algorithm might be different (Flajolet et al., 1990; Nisan & Wigderson, 1995). There are further two subclasses for randomized algorithms: i. Monte Carlo algorithms; ii. Las Vegas algorithms. On a particular input, A Las Vegas algorithm will continuously yield the same result. The order of the core implementations will only be affected by Randomization. As far as Monte Carlo algorithms are concerned, the output of these algorithms might alter, even be wrong. However, a Monte Carlo algorithm yields the correct output with a definite probability (Garey, 1979; Kukuk, 1997; Garey & Johnson, 2002). Now here the question which comes to mind is: For what good randomized algorithms are developed? The computation/processing might alter depending on coin throws (Hirschberg & Wong, 1976; Kannan, 1980). Monte Carlo algorithms don’t yield the accurate results but still, they are looked-for because of below-mentioned reasons:

Introduction To Algorithms

46

i.

Randomized algorithms frequently have the influence of worrying the input. Or put it another way, the input appears random, for which the bad circumstances seldom appear. ii. Randomized algorithms are frequently very stress-free to implement conceptually. As compared to their deterministic counterparts, in the runtime they are usually far more superior.

2.3. ONLINE VS. OFFLINE ALGORITHMS To determine, whether the given algorithm is online or it is offline one of the significant difference is discussed below. Algorithms whose inputs are not known at the beginning are called Online Algorithms. Usually, the inputs of algorithms are known beforehand but in the case of Online it is provided to them (Chazelle et al., 1989; Roucairol, 1996; Goodman & O’Rourke, 1997). It appears to be a negligible detail but its effects on the design and analysis of the algorithms are intense. These algorithms are frequently examined by applying the knowledge of competitiveness. This examining is the vilest case factor which when compared with the best algorithm usually takes longer time. Ski problem is one of the examples of an online problem (Wilson & Pawley, 1988; Xiang et al., 2004; Liu et al., 2007). Every day when a skier goes skiing he/she has to decide, whether to purchase or rent skis, till the time he/she decides to purchase them. It is not known to the skier that for how many days he/she can ski because of the changeable weather. Say T is the number of days he/she will ski. The cost of purchasing skis is B, while the cost of hiring them is 1 unit (Wierzbicki et al., 2002; Li, 2004; Pereira, 2009).

2.4. EXACT, APPROXIMATE, HEURISTIC, AND OPERATIONAL ALGORITHMS Most algorithms are developed with the aim of optimization in mind, e.g., to calculate the direct path, alignment or the marginal edit distance (Aydin & Fogarty, 2004). An exact algorithm focuses on calculating the optimum solution when such a goal is given. On runtime or in terms of memory this is relatively expensive and is not feasible for large input (Tran et al., 2004; Thallner & Moser, 2005; Xie et al., 2006). At this point, other schemes are tried. One of the schemes Approximation algorithms focuses on calculating a solution which is only a firm, definite factor worse than the optimum solution

Classification of Algorithms

47

(Ratakonda & Turaga, 2008). This means that an algorithm produces a c − approximation if it can ensure that the solution provided by it is never poorer than the factor c as compared to the optimum solution (Xiang et al., 2003; Aydin & Yigit, 2005). Heuristic algorithms, on the other hand, try to yield the optimum solution without ensuring that they will always provide the optimum solution. It is often easy to create a counterexample. A good heuristics algorithm is always at the optimum value or is close to it (Restrepo et al., 2004; Sevkli & Aydin, 2006). Last but not the least, there exits algorithms which do not focus on optimizing the objective functions (Yigit et al., 2004; Sevkli & Guner, 2006). These kinds of algorithms are called operational because they bind a sequence of computational tasks directed by an expert but not in aggregation with a particular objective function (e.g., ClustalW). Let us consider the problem of Traveling Salesman having triangle inequality for say n cities. This is a case of an NP-hard problem (we have no polynomial-time algorithm in our knowledge). Described below is a greedy, deterministic algorithm which produces a 2 − approximation for the traveling salesman problem. The triangle inequality produced in time O (n2). i. ii. iii.

For the whole graph oblique by n cities, a minimum spanning tree (T) is calculated All edges of the spanning tree T are duplicated by producing a Eulerian graph T.’ The Eulerian path is then found in T.’ The Eulerian cycle is then converted into the Hamiltonian cycle by intriguing shortcuts.

2.5. CLASSIFICATION ACCORDING TO THE MAIN CONCEPT With the help of the main algorithmic model which is known to us we can classify the algorithms as follows: i. ii. iii. iv. v.

Simple recursive algorithm; Divide-and-conquer algorithm; Dynamic programming algorithm; Backtracking algorithm; Greedy algorithm;

Introduction To Algorithms

48

vi. Brute force algorithm; and vii. Branch-and-bound algorithm.

2.5.1. Simple Recursive Algorithm This kind of the algorithm has the following characteristics: i. ii. iii.

a) 1. 2. 3. b) 1. 2. 3.

It directly solves the base problems. It persists with the easier subproblem. It performs some additional work to translate the solution to the easier subproblem and then into the solution of the given problem. Examples are discussed below: Counting the number of essentials in a list: Return zero if the given list is empty; otherwise. Exclude the first element and total the remaining essentials in the list. Add one to the outcome. Testing if a value befalls in a list: Return false if the given list is empty, otherwise. Return true if the first entity is the desired value in the given list; otherwise. Exclude the first entity, and then check whether the value befalls in the rest of the list.

2.5.2. Divide-and-Conquer Algorithm In this kind of algorithm, the magnitude of a problem is divided by a fixed factor. In an iteration, lesser, and lesser portion of the real problem is processed only. Several fastest and effective algorithms come under this class. The algorithms of this kind have logarithmic runtime (Battiti & Tecchiolli, 1994; Yigit et al., 2006). This algorithm comprises of two parts. i.

The original problem is divided into smaller and similar subproblems and then these subproblems are solved recursively. ii. The solutions of the subproblems are combined to create the solution to the original problem. Traditionally, an algorithm is only called divide-and-conquer if it contains two or more recursive calls. Let us consider the two examples described below:

Classification of Algorithms

49

i. Quicksort: a) Split the array into two equal parts, and then quicksort each part. b) No extra work is needed to combine the partitioned parts. ii. Merge sort: a) Slice the array into two, and then merge sort each of the half. b) Syndicate the two organized arrays into a single organized array by amalgamating them.

2.5.3. Dynamic Programming Algorithm Here the word ‘dynamic’ discusses the technique in which the result is computed by the algorithm. Occasionally, a solution of the given problem is dependent on the solution of the sub-problems (Taillard, 1991, 1995). It demonstrates the property of coinciding sub-problems. Hence we might have to recalculate the same values over and over for sub-problems in order to solve the original problem. Thus calculating cycles are fruitless (Fleurent & Ferland, 1994). As a remedy for these fruitless computing cycles, we can implement the dynamic programming technique or algorithm. In this algorithm, the outcome of each sub-problem is remembered and whenever this outcome is required, it is used instead of recalculating the result over and over again (Glover, 1989; Skorin-Kapov, 1990). Here, space is a tradeoff for time. i.e., More space is used to grasp the calculated values so that the execution speed can be increased significantly. The relation for the Nth Fibonacci number is the best example of a problem which contains coinciding sub-problems (Eberhart et al., 2001; Eberhart & Shi, 2001; 2004). Fibonacci number is represented as F (n) = F (n – 1) + F (n – 2). The expression above shows that the Nth Fibonacci number is dependent on the previous two numbers. In order to compute F(n) in a predictable way, calculations must be done in the following way. The similar colored values are those that will be calculated again and again. Note that F(n–2) is computed 2 times, F(n–3) 3 times and so on ... Hence, we are wasting a lot of time. In fact, this recursion will perform $$2^N$$ operations for a given N, and it is not at all solvable for N>40 on a modern PC within at least a year (Shi, 2001; Hu et al., 2003). Storing every value while computing it and recovering it instead of computing it again is the best possible solution to this. Exponential time

Introduction To Algorithms

50

algorithm is transformed into the linear time algorithm with the help of this (Kennedy et al., 2001). In order to fasten the problems for which coinciding subproblems exits dynamic programming technique is very significant (Kennedy & Mendes, 2002, 2006). This dynamic algorithm memorizes the previous results and then utilizes them to find the new outcomes (He et al., 2004). Dynamic programming algorithm is normally implemented for optimization problems where: i. The best solution is to be found among the multiple solutions. ii. Optimum substructure and coinciding sub-problem are required. iii. Optimum substructure: an optimum solution which contains optimum solutions to the sub-problems. iv. Coinciding sub-problems: The solutions of the sub-problems can be saved and reutilized in the bottom-up method. This technique is different from the Divide-and-Conquer technique, where the sub-problems usually do not overlap. Many examples of this technique can be found in bioinformatics. For example: i. a)

b)

ii. a)

b)

Computing an optimum pairwise alignment. Optimum substructure: The arrangement of two prefixes comprises of the solutions for the optimum arrangements of smaller prefixes. Coinciding sub-problems: The solution for optimum arrangement of two prefixes is made using the saved results of the arrangement of three sub-problems. Computing a Viterbi route in an HMM. Optimum substructure: a Viterbi route for input prefix concluding in a condition of HMM comprises of shorter Viterbi routes for minor parts of input and the other HMM conditions. Coinciding sub-problems: Solution for a Viterbi route for input prefix concluding in a condition of HMM is made using the saved results of the Viterbi routes for smaller input prefix and the HMM conditions.

2.5.4. Backtracking Algorithm This approach is quite similar to the Brute force algorithm which is discussed later in this chapter. There is a lot of difference between both of the algorithms. In Brute force algorithm, every potential combination of the

Classification of Algorithms

51

solution is generated and tested whether the solution is valid or not (Kröse et al., 1993; Pham & Karaboga, 2012). Unlike, in backtracking algorithm, every time a solution is generated it is tested and if it fulfills all the conditions only then succeeding solutions are generated, otherwise the approach will be backtracked and a different path will be adopted for finding the solution (Wu et al., 2005). The N Queens problem is the famous example of this class. According to the N Queens, an N X N sized chessboard is provided. N queens are placed on the chessboard in such a way that none of the queens is under the attack from the other queen. It proceeds by moving a queen in each column and suitable row. Every time a queen is moved, it is checked whether it is under attack. If so, then a different cell under that particular column is selected where the queen is placed. The process can be envisioned like a tree. In the tree, every node is a chessboard with a different formation. If we are not able to progress at any stage, then we can backtrack from that particular node and advance by increasing other nodes. The benefit of this approach over the Brute force algorithm is that fewer numbers of candidates are generated as compared to the Brute force approach. The valid solutions can be isolated very quickly and effectively with the help of this approach. Consider the example of 8 X 8 chessboard, if Brute force approach is applied here then 4,426,165,368 solutions have to be generated and each of them has to be tested. In the discussed approach the amount gets reduced to about 40,320 solutions. The above-discussed algorithm is centered on a depth-first recursive hunt which has the following features: i.

Tests to perceive whether the solution has been created and if it is created returns it; otherwise ii. For each choice which can be prepared at this stage. a) Make that choice. b) Recur. c) If the recursion yields an outcome, return it. iii. If there is no choice remained, return failure. As an example consider coloring/painting a map with a maximum of four colors, i.e. Color (Country n)

Introduction To Algorithms

52

i. ii. iii. a) b) c) iv.

If all of the countries are colored (n > no of countries) return success; otherwise, Out of the four colors, for every color c. If the country is not nearby to a colored country. Country n is colored with color c. Recursively color country n +1. If effective, return success. Return failure (if the loop still exits).

2.5.5. Greedy Algorithm This algorithm occasionally gives better output for optimization problems and usually, these algorithms work in stages. At each stage: i.

You can achieve the best possible right now, without concern for future results. ii. By selecting a local optimal at each step, one can achieve a global optimal. In my problems, making the choice of this algorithm leads to an optimum solution. This algorithm is appropriate for optimization problems. In each step of this algorithm, we will create a locally optimal solution which will result in a globally optimum solution. Once selected, we cannot withdraw from it in later phases. Verifying the correctness of this algorithm is of absolute importance since not all of the greedy algorithms results in a globally optimum solution (Creput et al., 2005). Let us consider the situation where you are provided coins of a certain amount and requested to make some specific amount of money using the minimum coins. Frequently this approach is quite effective and always produces an optimum solution for some specific kinds of problems. Let us take another example. Suppose a boy wants to count out a specific amount of money, consuming the minimum number of bills and coins. Now here the greedy algorithm approach will be to select the bill or coin with the largest possible value and which does not exceed. For instance: To make 6.39 dollars, we can choose: i. ii. iii. iv. v.

a 5 dollar bill. a 1 dollar bill, to make 6 dollars. a 25 cent coin, to make 6.25 dollars. a 10 cent coin, to make 6.35 dollars. four 1 cent coins, to make 6.39 dollars.

Classification of Algorithms

53

The greedy algorithm provides the optimal solution every time for the number of dollars.

2.5.6. Brute Force Algorithm This kind of algorithm resolves the problem in a simple manner using the basic definition. This algorithm is typically the easiest to execute, but resolving a problem through this algorithm has its own drawbacks. This can only be implemented to problems where the size of the input is small and are typically very slow (Divina & Marchiori, 2005). This algorithm tries all the possibilities in order to find a satisfactory solution to the problem. Brute Force algorithm can be categorized as: i.

ii.

Optimizing: This algorithm helps us to find the best possible solution. This might necessitate finding all the solutions if the value of the most excellent solution is already known when the best possible solution is found it might stop. (For example: finding the most excellent path for the traveling salesman). Satisficing: As soon as the best possible solution to the problem is found this algorithm stops. (For instance: Finding a path for the traveling salesman within 10% of the optimum.).

2.5.7. Branch-and-Bound Algorithm This kind of algorithm is normally used to provide help in the problems where optimization is required. As the algorithm starts its computations, a tree containing the subproblems is created. The root problem is the original problem for which the algorithm is developed. For a given problem a certain method is implemented to create the lower and upper jump (Gunn, 1998; Vapnik, 2013). The bonding methods are applied to each node. i.

A realistic solution to the particular subproblem is estimated if the bounds are equivalent. ii. If bounds are not equivalent then divide the problem corresponding to that particular node, and form the two subproblems within the children nodes. By using the best possible solution known to continue trimming the sections of a tree till the time all of the nodes have been trimmed. In the case of the traveling salesman problem following would be the example of the above-described algorithm. i.

At least once all n cities are to be visited by the salesman and he

Introduction To Algorithms

54

ii. iii. a. b. iv.

wants to lessen the total distance traveled by him. Now here the root problem is, to find the shortest path through all n cities visiting each city at least once. Divide the node into further two child problems: Shortest path for visiting the city A first. Shortest path for not visiting the city A first. As the tree expands continue subdividing in the same way.

Classification of Algorithms

55

REFERENCES 1.

2.

3. 4. 5.

6. 7. 8.

9. 10.

11.

12.

13.

Aydin, M. E., & Fogarty, T. C., (2004). A distributed evolutionary simulated annealing algorithm for combinatorial optimization problems. Journal of Heuristics, 10(3), 269–292. Aydin, M. E., & Fogarty, T. C., (2004). A simulated annealing algorithm for multi-agent systems: a job-shop scheduling application. Journal of Intelligent Manufacturing, 15(6), 805–814. Aydin, M. E., & Yigit, V. (2005). 12 Parallel Simulated Annealing. Parallel Metaheuristics: A new Class of Algorithms, 47, 267. Battiti, R., & Tecchiolli, G., (1994). The reactive tabu search. ORSA Journal on Computing, 6(2), 126–140. Chazelle, B., Edelsbrunner, H., Guibas, L., & Sharir, M., (1989, February). Lines in space-combinators, algorithms, and applications. In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (pp. 382–393). ACM. Codenotti, B., Pemmaraju, S., & Varadarajan, K. (2004). The computation of market equilibria. Acm Sigact News, 35(4), 23-37. Cook, S. A., (1983). An overview of computational complexity. Communications of the ACM, 26(6), 400–408. Cook, S. A., (1987). A survey of computational complexity theory. Advances in Mathematics, Physics and Astronomy, 32(1), 12– 29. Cormen, T. H., & Leiserson, C. E., (1989). RL Rivest Introduction to Algorithms MIT press. Cambridge, Massachusetts London, England. Creput, J. C., Koukam, A., Lissajoux, T., & Caminada, A., (2005). Automatic mesh generation for mobile network dimensioning using evolutionary approach. IEEE Transactions on Evolutionary Computation, 9(1), 18–30. D’Alberto, P., (2000). Performance evaluation of data locality exploitation (Vol. 1, pp. 15-43). University of Bologna, Dept. of Computer Science, Tech. Rep. Dayde, M. J., (1996). IS Du A Blocked Implementation of Level 3 BLAS for RISC Processors TR_PA_96_06, available online http:// www.cerfacs.fr/algor/reports. TR_PA_96_06. ps. gz Apr, 6. Divina, F., & Marchiori, E., (2005). Handling continuous attributes in an evolutionary inductive learner. IEEE Transactions on Evolutionary

56

14.

15.

16. 17.

18.

19. 20. 21. 22.

23. 24.

25.

26.

Introduction To Algorithms

Computation, 9(1), 31–43. Eberhart, R. C., & Shi, Y., (2001). Tracking and optimizing dynamic systems with particle swarms. In Evolutionary Computation, 2001. Proceedings of the 2001 Congress on (Vol. 1, pp. 94–100). IEEE. Eberhart, R. C., & Shi, Y., (2004). Guest editorial special issue on particle swarm optimization. IEEE Transactions on Evolutionary Computation, 8(3), 201–203. Eberhart, R. C., Shi, Y., & Kennedy, J., (2001). Swarm Intelligence. Vol. 1, pp. 20-32. Elsevier. Flajolet, P., Puech, C., Robson, J. M., & Gonnet, G., (1990). The Analysis of Multidimensional Searching in Quad-Trees (Doctoral dissertation, INRIA). Fleurent, C., & Ferland, J. A., (1994). Genetic hybrids for the quadratic assignment problem. Quadratic Assignment and Related Problems, 16, 173–187. Garey, M. R., (1979). DS Johnson Computers and intractability. A Guide to the Theory of NP-Completeness. Garey, M. R., & Johnson, D. S., (2002). Computers and Intractability (Vol. 29, pp. 17-38). New York: wh freeman. Glover, F., (1989). Tabu search—part I. ORSA Journal on Computing, 1(3), 190–206. Goodman, J. E., & O’Rourke, J., (1997). Handbook of Discrete and Computational Geometry, volume 6 of CRC Press Series on Discrete Mathematics and its Applications. Gunn, S. R. (1998). Support vector machines for classification and regression. ISIS technical report, 14(1), 5–16. He, S., Wu, Q. H., Wen, J. Y., Saunders, J. R., & Paton, R. C., (2004). A particle swarm optimizer with the passive congregation. Biosystems, 78(1–3), 135–147. Hirschberg, D. S., & Wong, C. K., (1976). A polynomial-time algorithm for the knapsack problem with two variables. Journal of the ACM (JACM), 23(1), 147–154. Hu, X., Eberhart, R. C., & Shi, Y., (2003, April). Particle swarm with extended memory for multiobjective optimization. In Swarm Intelligence Symposium, 2003. SIS’03. Proceedings of the 2003 IEEE (pp. 193–197). IEEE.

Classification of Algorithms

57

27. Kågström, B., & Van Loan, C., (1998). Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues. ACM Transactions on Mathematical Software (TOMS), 24(3), 303–316. 28. Kannan, R., (1980). A polynomial algorithm for the two-variable integer programming problem. Journal of the ACM (JACM), 27(1), 118–122. 29. Karp, R. M., (1986). Combinatorics, complexity, and randomness. Commun. ACM, 29(2), 97–109. 30. Kennedy, J., & Mendes, R., (2002). Population structure and particle swarm performance. In Evolutionary Computation, 2002. CEC’02. Proceedings of the 2002 Congress on (Vol. 2, pp. 1671–1676). IEEE. 31. Kennedy, J., & Mendes, R., (2006). Neighborhood topologies in fully informed and best-of-neighborhood particle swarms. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 36(4), 515–519. 32. Kennedy, J., Eberhart, R. C., & Shi, Y., (2001). Swarm Intelligence, Morgan Kaufmann Publishers. Inc., San Francisco, CA. 33. Kröse, B., Krose, B., van der Smagt, P., & Smagt, P., (1993). An introduction to neural networks. 34. Kukuk, M., (1997). Compact answer databases for the solution of geometric query problems by scanning (doctoral dissertation, diploma thesis, computer science VII, university Dortmund), Vol. 1, pp. 15-29. 35. Li, J., (2004). Peer Streaming: A practical receiver-driven peer-topeer media streaming system. Microsoft Research MSR-TR-2004–101, Tech. Rep. 36. Liu, H., Luo, P., & Zeng, Z., (2007). A structured hierarchical P2P model based on a rigorous binary tree code algorithm. Future Generation Computer Systems, 23(2), 201–208. 37. Nisan, N., & Wigderson, A., (1995, May). On the complexity of bilinear forms: dedicated to the memory of Jacques Morgenstern. In Proceedings of the Twenty-Seventh Annual ACM Symposium on Theory of Computing (pp. 723–732). ACM. 38. Panda, P. R., Nakamura, H., & Dutt, N. D., (1997). Tiling and Data Alignment. Solving Irregularly Structured Problems in PArallel Lecture Notes in Computer Science. 39. Pereira, M., (2009). Peer-to-Peer Computing. In Encyclopedia of Information Science and Technology, Second Edition (pp. 3047–3052).

58

40.

41. 42.

43.

44.

45.

46.

47.

48. 49. 50.

51. 52.

Introduction To Algorithms

IGI Global. Pham, D., & Karaboga, D., (2012). Intelligent Optimization Techniques: Genetic Algorithms, Tabu Search, Simulated Annealing and Neural Networks. Springer Science & Business Media. Rabin, M. O., (1977). Complexity of computations. Communications of the ACM, 20(9), 625–633. Ratakonda, K., & Turaga, D. S., (2008). Quality Models for Multimedia Delivery in a Services Oriented Architecture. Managing Web Service Quality: Measuring Outcomes and Effectiveness: Measuring Outcomes and Effectiveness. Restrepo, JH, Castro, JJS, & Mesa, MH (2004). SOLUTION TO THE PROBLEM OF DELIVERY OF ORDERS USING SIMULATED RECOCID. Scientia et technica, 1(24), 225-230. Roucairol, C., (1996). Parallel processing for difficult combinatorial optimization problems. European Journal of Operational Research, 92(3), 573–590. Sevkli, M., & Aydin, M. E., (2006, April). A variable neighborhood search algorithm for job shop scheduling problems. In European Conference on Evolutionary Computation in Combinatorial Optimization (pp. 261–271). Springer, Berlin, Heidelberg. Sevkli, M., & Guner, A. R., (2006, September). A continuous particle swarm optimization algorithm for incapacitated facility location problem. In International Workshop on Ant Colony Optimization and Swarm Intelligence (pp. 316–323). Springer, Berlin, Heidelberg. Shi, Y., (2001). Particle swarm optimization: developments, applications, and resources. In evolutionary computation, 2001. Proceedings of the 2001 Congress on (Vol. 1, pp. 81–86). IEEE. Skorin-Kapov, J., (1990). Tabu search applied to the quadratic assignment problem. ORSA Journal on Computing, 2(1), 33–45. Stockmeyer, L., (1987). Classifying the computational complexity of problems. The journal of symbolic logic, 52(1), 1–43. Stockmeyer, L., & Meyer, A. R., (2002). Cosmological lower bound on the circuit complexity of a small problem in logic. Journal of the ACM (JACM), 49(6), 753–784. Taillard, É., (1991). Robust taboo search for the quadratic assignment problem. Parallel computing, 17(4–5), 443–455. Taillard, E. D., (1995). Comparison of iterative searches for the

Classification of Algorithms

53.

54.

55. 56.

57.

58.

59.

60.

61.

62.

63.

59

quadratic assignment problem. Location science, 3(2), 87–105. Thallner, B., & Moser, H., (2005, May). Topology control for faulttolerant communication in highly dynamic wireless networks. In Intelligent Solutions in Embedded Systems, 2005. Third International Workshop on (pp. 89–100). IEEE. Tran, D. A., Hua, K. A., & Do, T. T., (2004). A peer-to-peer architecture for media streaming. IEEE Journal on Selected Areas in Communications, 22(1), 121–133. Vapnik, V., (2013). The Nature of Statistical Learning Theory. Springer science & business media. Vitter, J. S., & Simons, R. A., (1986). New classes for parallel complexity: A study of unification and other complete problems for P. IEEE Transactions on Computers, 35(5), 403–418. Wierzbicki, A., Strzelecki, R., Swierezewski, D., & Znojek, M., (2002). Rhubarb: a tool for developing scalable and secure peer-topeer applications. In Peer-to-Peer Computing, 2002. (P2P 2002). Proceedings. Second International Conference on (pp. 144–151). IEEE. Wilson, G. V., & Pawley, G. S., (1988). On the stability of the traveling salesman problem algorithm of Hopfield and Tank. Biological Cybernetics, 58(1), 63–70. Wu, X., Sharif, B. S., & Hinton, O. R., (2005). An improved resource allocation scheme for plane cover multiple access using genetic algorithm. IEEE Transactions on Evolutionary Computation, 9(1), 74–81. Xiang, Z., Zhang, Q., Zhu, W., & Zhang, Z., (2003, July). Replication strategies for peer-to-peer based multimedia distribution service. In Multimedia and Expo, 2003. ICME’03. Proceedings. 2003 International Conference on (Vol. 2, pp. II-153). IEEE. Xiang, Z., Zhang, Q., Zhu, W., Zhang, Z., & Zhang, Y. Q., (2004). Peer-to-peer based multimedia distribution service. IEEE Transactions on Multimedia, 6(2), 343–355. Xie, Z. P., Zheng, G. S., & He, G. M., (2006, July). Efficient loss recovery in application overlay stored media streaming. In Visual Communications and Image Processing 2005 (Vol. 5960, p. 596008). International Society for Optics and Photonics. Yigit, V., Aydin, M. E., & Turkbey, O., (2004). Evolutionary simulated

60

Introduction To Algorithms

annealing algorithms for uncapacitated facility location problems. In Adaptive Computing in Design and Manufacture VI (pp. 185–194). Springer, London. 64. Yigit, V., Aydin, M. E., & Turkbey, O., (2006). Solving large-scale incapacitated facility location problems with evolutionary simulated annealing. International Journal of Production Research, 44(22), 4773–4791.

CHAPTER

3

AN INTRODUCTION TO HEURISTIC ALGORITHMS

CONTENTS 3.1. Introduction....................................................................................... 62 3.2. Algorithms And Complexity............................................................... 63 3.3. Heuristic Techniques.......................................................................... 65 3.4. Evolutionary Algorithms..................................................................... 66 3.5. Support Vector Machines................................................................... 68 3.6. Current Trends................................................................................... 70 References................................................................................................ 72

62

Introduction To Algorithms

3.1. INTRODUCTION Currently, computers are used to resolve incredibly complex problems. However, in order to cope with a problem, we must develop an algorithm. Occasionally the human brain is not capable of accomplishing this task. Furthermore, exact algorithms might require centuries to manage with intimidating challenges. In such circumstances heuristic algorithms that give approximate solutions but have tolerable time and space complexity show indispensable role (Lin, 1965). In this chapter of heuristics, the basic underlying ideas and their ranges of application are surveyed. We, similarly, described in further detail some new heuristic techniques, namely Support Vector Machines and Evolutionary Algorithms (Lin & Kernighan, 1973). The most essential among a range of topics that involve computation are complexity estimation, algorithm validation, and optimization. An extensive portion of theoretical computer science copes with these tasks. Generally, the complexity of tasks is surveyed studying the most appropriate computational resources similar to execution space and time (Garey & Johnson, 2002). The extension of problems that are resolvable with a given limited extent of space and time into well-defined categories is a very complex task, but it can help exceptionally to save time and money expended on the algorithms design. A huge collection of papers were devoted to algorithm development. A brief historical overview of the basic issues in theory of computation may be found in the literature (Cook, 1983; 1987). We do not the deliberate precise definition of algorithm and complexity (Cormen et al., 1989). Modern problems come out to be very complex and relate to the exploration of large data sets. Even though an exact algorithm can be established, its time or space complexity possibly will turn out unacceptable. But in actuality, it is often adequate to find a partial or an approximate solution. Such admission encompasses the set of techniques to deal with the problem. We converse heuristic algorithms which propose some approximations to the resolution of optimization problems. In these problems, the aim is to find the best of all possible solutions, explicitly the one that maximizes or minimizes an objective function. A function used to assess a quality of the created solution is called the objective function. Many real-life issues are easily specified as optimization problems. The collection of all potential solutions for a certain problem can be considered as a search space, and optimization algorithms are often stated as search algorithms (Xiang et al., 2003, 2004). Approximate algorithms involve the interesting issue of quality assessment of the solutions they find. Taking into consideration that usually the best

An Introduction to Heuristic Algorithms

63

solution is unknown, this problem might be a real challenge concerning strong mathematical analysis. With regard to the quality issue, the aim of the heuristic algorithm is to discover, for all instances of the problem, as good solution as possible. There are general heuristic approaches that are effectively applied to manifold problems (Gillett & Miller, 1974; Nawaz et al., 1983) (Figure 3.1).

Figure 3.1: Comparison between conventional algorithms and heuristic algorithms. (Source: https://www.differencebtw.com/difference-between-algorithmand-heuristic).

3.2. ALGORITHMS AND COMPLEXITY It is hard to imagine the range of current computational tasks and the range of algorithms developed to resolve them. Algorithms that either provide a solution or give nearly the right answer, not for all cases of the problem are named heuristic algorithms. This group comprises a plentiful spectrum of techniques based on traditional methods as well as specific ones. To start with, we sum up the key principles of traditional search algorithms (Lee & Geem, 2005). The modest of search algorithms is ‘exhaustive search.’ It tries all potential solutions from a programmed set and consequently picks the best one. Local search is a variety of exhaustive search that simply focuses on a limited part of the search space. Local search can be systematized in different ways. Popular hill-climbing methods belong to this class. Such algorithms constantly replace the existing solution with the finest of its neighbors if it is superior to the current. For instance, heuristics for the issue of intragroup imitation for multimedia distribution service centered on Peerto-Peer network is centered on hill-climbing strategy (Xiang et al., 2003, 2004). Divide and conquer algorithms tend to divide a problem into smaller

64

Introduction To Algorithms

problems so that they are easier to solve. Solutions of the small problems should be comparable to a solution for the initial one. This technique is promising but its use is narrow because there is no large number of problems that can easily be partitioned and combined in such a way. The branch-andbound method is a dire enumeration of the search space. It enumerates but constantly tends to rule out portions of the search space that do not contain the best solution. Dynamic programming is a comprehensive search that evades re-computation by keeping the solutions of sub-problems. The key fact for using this technique is expressing the solution procedure as a recursion. A popular method to build sequentially space of solutions is ‘greedy’ technique, which is based on the obvious principle of taking the (local) finest choice at each step of the algorithm so as to find the global best of some objective function (Campbell et al., 1970; Gupta, 1971). Generally, heuristic algorithms are employed for problems that are difficult to solve. Categories of time complexity are explained to distinguish problems rendering to their “hardness.” Category P involves all those problems that may be solved in polynomial time on a deterministic Turing machine from the magnitude of the input. Turing machines are a perception that is used to validate the notion of computational complexity and algorithm. Category NP comprises all those problems for whom a solution can be obtained in polynomial time on a Turing machine that is non-deterministic. As such a machine does not happen to be, practically it states that an exponential algorithm may be inscribed for an NP-problem, nothing is affirmed whether a polynomial algorithm occurs or not. Category NP-complete, a subclass of NP, contains problems. For example, a polynomial algorithm for some of them could be converted to polynomial algorithms to resolve all other NP problems. Finally, the category NP-hard can be assumed as the category of problems which are NP-complete or harder. NP-hard problems have similar trait as NP-complete problems however they do not necessarily belong to class NP, i.e., class NP-hard also includes problems for which no algorithms can be provided at all (Armour & Buffa, 1963; Christofides, 1976). In order to validate the application of some heuristic algorithm, we verify that the problem belongs to the categories NP-complete or NP-hard. Most probably there are no polynomial algorithms for solving such problems, hence, for sufficiently large inputs heuristics are developed (Jaw et al., 1986; Mahdavi et al., 2007; Omran & Mahdavi, 2008).

An Introduction to Heuristic Algorithms

65

3.3. HEURISTIC TECHNIQUES Dynamic programming and branch-and-bound technique are quite effective; however, their time-complexity is often too high and intolerable for NPcomplete tasks. Hill-climbing algorithm is effective, however, it has a significant disadvantage called premature convergence. As it is “greedy,” it always discovers the closest local optima of low quality. The aim of the modern heuristics is to minimize or overcome this disadvantage (Lee & Geem, 2004; Kennedy, 2011). Simulated annealing algorithm, devised in 1983, uses a method similar to hill-climbing, but infrequently accepts solutions that are poorer than the current. The possibility of such acceptance is falling with time (Aydin & Fogarty, 2004; Aydin & Yigit, 2005). Tabu search encompasses the idea to evade local optima by expanding memory structures. The problem of imitated annealing is that after “jump” the algorithm can merely repeat its own track. Tabu search prevents the repetition of changes that have been made lately (Battiti, 1996). Swarm intelligence was presented in 1989 (Eberhart et al., 2001). It is a simulated intelligence technique, built on the study of combined behavior in self-organized, decentralized systems. Two of the most effective types of this approach are Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO). In ACO artificial ants construct solutions by way of moving on the problem chart and altering it in such a manner that future ants can construct better solutions. PSO contend with problems in which the finest solution can be signified as a point or a surface in an n-dimensional space. The chief benefit of swarm intelligence methods is that they are remarkably impervious to the local optima problem. Evolutionary algorithms succeed in attempting premature convergence by seeing a number of solutions instantaneously (Kirkpatrick et al., 1983; Geem, 2006). Later we will describe this group of algorithms further elaborately. Neural Networks are enthused by biological neuron systems. They comprise of units, named neurons, and interconnections amongst them. After special preparation on some assumed dataset Neural Networks may make predictions for instances that are not in the preparation set. In reality, Neural Networks do not work well each time because they suffer significantly from problems of overfitting and under-fitting (Eberhart & Shi, 2001; 2004). These problems relate to the precision of prediction (Krose et al., 1993). If a system is not complex enough it might simplify the rules which the data obey. From the other perspective, if a system is too complex

Introduction To Algorithms

66

it may consider the noise that usually helps at the preparation data set while inferring the rules. The worth of prediction after preparation is deteriorated in both instances. The problem of early convergence is also important for Neural Networks (Holland, 1992; Helsgaun, 2000; Geem et al., 2005). Support Vector Machines (SVMs) encompass the notions of Neural Networks. They successfully overcome early convergence as the convex objective function is taken into account, hence, only one optimum exists. Conventional divide and conquer technique provides an elegant solution for distinguishable problems. Relating to SVMs, that give effective classification, it turns into an exceptionally powerful instrument. Later we deliberate SVM classification trees and which applications currently provide a promising object for research (Johnson & McGeoch, 1997). Comparative analysis and description of simulated annealing, neural networks, evolutionary algorithms and tabu search can be found in the literature (Lawler et al., 1985; Reinelt, 1991; Pham & Karaboga, 2012).

3.4. EVOLUTIONARY ALGORITHMS Evolutionary algorithms are techniques that exploit concepts of biological evolution, for example, recombination, reproduction, and mutation for probing the solution of an optimization issue. They apply the survival principle to a set of possible solutions to produce steady approximations to the optimum. A different set of approximations is generated by the process of choosing individuals in relation to their objective function that is termed fitness for evolutionary algorithms, also breeding them together with operators enthused from genetic processes (Taillard, 1990, 1993). This process brings about the evolution of a population of individuals that are well suited to their surroundings than their ancestors. The main circle of evolutionary algorithms contains the following steps: i. ii. iii. iv. v.

Initialize and assess the initial population. Execute competitive selection. Apply genetic operators to produce new solutions. Calculate solutions in the population. Start over from point 2 and reprise until some convergence standard is satisfied. Sharing the common notion, evolutionary techniques can vary in the details of implementation & the issues to which they are applied. Genetic

An Introduction to Heuristic Algorithms

67

programming looks for solutions in the computer programs form. Their fitness is defined by the ability to resolve a computational problem (Johnson, 1954; Osman & Potts, 1989). The difference between evolutionary programming and genetic programming is that the former fixes the organization of the program and permits their numerical parameters to change. Evolution strategy works with the vectors of real numbers as depictions of solutions and takes self-adaptive mutation rates (Palmer, 1965; Dannenbring, 1977). Genetic Algorithms (GAs) are the most successful among evolutionary algorithms. These algorithms were inspected by John Holland (1992) and demonstrated essential effectiveness. GAs are based on the point that the role of mutation develops the individual quite rarely and, therefore, they depend typically on applying recombination operators. They try to find solutions to the issues in the form of a series of numbers, usually binary (Reeves, 1995; Ruiz & Maroto, 2005). Optimization problems demanding a large number of high-performance computing resources are the prevalent zone for applying genetic algorithms. For example, the issue of effective resource allocation in association with Plane Cover Multiple-Access (PCMA) system has been examined by Wu et al. (2005). Its aim is to maximize the achievable capacity of packet switching wireless cellular networks. The main issue to look for resource allocation is to abate the Unit of Bandwidth (UB) that must be allocated (Fisher & Jaikumar, 1981; Christofides, N., & Hadjiconstantinou, 1995). The problem has been established to be in the category NP-hard. Authors used genetic algorithm rather than a greedy search used before. Computer imitation has been performed for the ideal cellular system with a single base station that can sustain m connections requiring one UB per second, assembled in a cluster of B cells. As result, it is indicated that the system capacity utilization can be improved by genetic algorithm (Clarke & Wright, 1964; Gendreau et al., 1994). The substantial rise of the consumer demand for mobile phones and computers made network optimization problems awfully relevant. Being a common technique that can easily be modified under various circumstances, evolutionary algorithms are extensively used in this area (Wren & Holliday, 1972; Osman, 1993). For example, let us put forward Adaptive Mesh Problem (AMP) that has an objective to minimize the required number of cellular network base stations to cover a region. Like the previous discoursed problem, AMP is NPhard. HIES (Hybrid Island Evolutionary Strategy), one of the evolutionary

68

Introduction To Algorithms

methods, has been applied to grab this problem (Gaskell, 1967; Creput et al., 2005). It signifies evolutionary algorithm borrowing properties from two kinds of genetic algorithms, called fine-grained or cellular GA and island model GA. The original issue has been converted into a geometric interlocking generation issue and specific genetic operators, crossover, macromutation, and micro mutation have been altering an assortment of hexagonal cells. Primarily regular honeycomb was converted to irregular mesh that corresponds better to the real-life circumstances (Mole & Jameson, 1976; Mahdavi et al., 2007). The desired outcome, the reduction of the total number of base stations, is achieved. Some other cases of evolutionary algorithms observed in a paper by (Divina & Marchiori, 2005), which is dedicated to machine learning theory problems.

3.5. SUPPORT VECTOR MACHINES Statistical learning theory relates to the problem of choosing preferred functions depending on empirical data. Its basic problem is a generalization, which involves inferring rules for future observations, given simply a finite amount of data. Support Vector Machines (SVM) is the most prominent technique among modern outcomes in this field (Geem et al., 2002; Fesanghary, 2008; Geem, 2008). The basic principles of Support Vector Machines (SVMs) have been established by Vapnik (2013). Owing to their attractive potentials they immediately gained an extensive range of applications. SVMs deviated from the principle of Empirical Risk Minimization (ERM) exemplified by conservative neural networks, which diminishes the error on the preparation data, and take into account Structural Risk Minimization (SRM) principle that diminishes an upper bound on the anticipated risk (Gupta, 1971; Gunn, 1998; Kim et al., 2001). SVMs support regression and classification tasks based on the idea of the optimized separator. The classification problem may be stated as an issue of dataset separation into categories by the functions which are brought up by available instances. We will denote such functions as classifiers (Nawaz et al., 1983; Widmer & Hertz, 1989; Hoogeveen, 1991). In a regression task, we must estimate the functional dependency of the dependent variable (y) on a group of independent variables (x). It is supposed that the correlation between the dependent and independent variables is denoted by a deterministic function (f) with some additive noise. Contemplate the problem of splitting the set of vectors taken from

An Introduction to Heuristic Algorithms

69

two classes, {(x1,y1),...,(xl,yl)}, x ∈ Rn,y ∈ {−1,1}, With a hyperplane, hw,xi + b = 0,

where w and b are parameters, hw, xi denotes inner product.

Figure 3.2: Illustration of the classification problem, (Source: https://www. researchgate.net/publication/228573156_An_introduction_to_heuristic_algorithms).

The objective is to isolate categories by the hyperplane without faults and maximize the space between the closest vector and the hyperplane. Such a hyperplane is termed optimal separating hyperplane. According to the outcomes of Gunn (1998), the optimal splitting hyperplane minimizes. ,

(1)

Under the constraints yi[hw, xi + b] ≥ 1.

(2)

The answer to the optimization problem (1), (2) is specified by the saddle idea of the Lagrangian functional. Points having non-zero Lagrange multipliers are named Support Vectors & they are used to define the generated classifier. Typically the support vectors is a minor subset of the preparation data set. This fact offers the main attractive feature of the SVMs, specifically low computational complexity (Ignall & Schrage, 1965; Lee, 1967). In the case when preparation data is not linearly divisible, there are two methods; the first is to introduce an extra function concomitantly with

70

Introduction To Algorithms

misclassification and another one is to practice a more complex function to define the boundary (Hillier & Connors, 1966; Seehof & Evans, 1967). The optimization problem at large is posed to reduce the classification error in addition to the bound on the classifier’s VC dimension. The VC dimension of a group of functions is (p) if there exists a group of (p) points, so that these points can be divided in all 2p possible conformations using this functions, & there is no group of (q) points, q > p, sustaining this property (Buffa, 1964; Nugent et al., 1968). Whole vast and complex concept of SVMs cannot be explained in this article. A lot of techniques based on the concepts of SVMs have been established in the last years. SVM classification tree algorithm among them is successfully applied in image and text classification (Armour, 1974; Kusiak & Heragu, 1987). A classification tree involves internal and external nodes joined by branches. Each internal node carries out a split function that divides a preparation data into two disjoint subgroups and each external node holds a label indicating the expected class of the particular feature vector (Tate & Smith, 1995; Meller & Gau, 1996). Recently this technique has been applied to cope with the classification intricacy of membership verification problem, a usual issue in digital security systems. Its objective is to differentiate membership class (M) from the non-membership class (G−M) in the human group (G). An SVM classifier is made competent using two partitioned subdivisions, and, finally, the competent SVM tree is used for detecting the membership of an unidentified person (Page, 1961; Gupta, 1972; 1976). Experimental results have revealed that the proposed scheme has a better robustness and performance than previous approaches (Hundal & Rajgopal, 1988; Ho & Chang, 1991).

3.6. CURRENT TRENDS This chapter has given an outline of heuristics that are approximate methods to solve optimization problems. Usually, heuristic algorithms are established to have low time intricacy and are applied to the complex problems. We briefly defined basic modern and traditional heuristic strategies. Support Vector Machines and Evolutionary algorithms were discussed more comprehensively. Due to their eminent properties, they gained a great popularity (Papadimitriou & Yannakakis, 1993; Rosenkrantz et al., 2009). Newly appeared research outcomes confirm the point that their applications can be considerably enlarged in the future (Arora, 1998; Koulamas, 1998).

An Introduction to Heuristic Algorithms

71

This paper does not seem to be complete. It would be interesting to perform a more deep survey of heuristics, compare accuracy and implementation complexity of the various approximate algorithms. But this job cannot be easily fulfilled because of the massive bulk of information. We did not even touch such a noticeable area for heuristic algorithms, the planning, and scheduling theory (Papadimitriou, 1977). But we anticipate that our work makes the extreme significance of heuristics in current computer science clear (Frieze et al., 1982; Mitchell, 1999).

72

Introduction To Algorithms

REFERENCES 1.

2.

3.

4.

5. 6. 7. 8.

9.

10.

11.

12. 13.

Armour, G. C., & Buffa, E. S., (1963). A heuristic algorithm and simulation approach to relative location of facilities. Management Science, 9(2), 294–309. Arora, S., (1998). Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems. Journal of the ACM (JACM), 45(5), 753–782. Aydin, M. E., & Fogarty, T. C., (2004). A distributed evolutionary simulated annealing algorithm for combinatorial optimization problems. Journal of Heuristics, 10(3), 269–292. Aydin, M. E., & Fogarty, T. C., (2004). A simulated annealing algorithm for multi-agent systems: a job-shop scheduling application. Journal of Intelligent Manufacturing, 15(6), 805–814. Aydin, M. E., & Yigit, V., (2005). 12 Parallel Simulated Annealing. Parallel Met heuristics: A new Class of Algorithms, 47, 267. Battiti, R., & Tecchiolli, G., (1994). The reactive tabu search. ORSA Journal on Computing, 6(2), 126–140. Buffa, E. S., (1964). Allocating facilities with CRAFT. Harvard Business Review, 42(2), 136–158. Campbell, H. G., Dudek, R. A., & Smith, M. L. (1970). A heuristic algorithm for the n job, m machine sequencing problem. Management Science, 16(10), B-630. Christofides, N., (1976). The vehicle routing problem. Revue française d’automatique, informatique, recherche opérationnelle. Recherche opérationnelle, 10(V1), 55-70. Christofides, N., & Hadjiconstantinou, E. (1995). An exact algorithm for orthogonal 2-D cutting problems using guillotine cuts. European Journal of Operational Research, 83(1), 21-38. Clarke, G., & Wright, J. W., (1964). Scheduling of vehicles from a central depot to a number of delivery points. Operations Research, 12(4), 568–581. Cook, S. A. (1983). An overview of computational complexity. Communications of the ACM, 26(6), 400–408. Cook, S. A., (1987). A survey of computational complexity theory. Advances in Mathematics, Physics and Astronomy, 32(1), 12– 29.

An Introduction to Heuristic Algorithms

73

14. Cormen, T. H., & Leiserson, C. E., (1989). RL Rivest Introduction to Algorithms MIT press. Cambridge, Massachusetts London, England. 15. Creput, J. C., Koukam, A., Lissajoux, T., & Caminada, A., (2005). Automatic mesh generation for mobile network dimensioning using evolutionary approach. IEEE Transactions on Evolutionary Computation, 9(1), 18–30. 16. Dannenbring, D. G., (1977). An evaluation of flow shop sequencing heuristics. Management Science, 23(11), 1174–1182. 17. Divina, F., & Marchiori, E., (2005). Handling continuous attributes in an evolutionary inductive learner. IEEE Transactions on Evolutionary Computation, 9(1), 31–43. 18. Eberhart, R. C., & Shi, Y., (2001). Tracking and optimizing dynamic systems with particle swarms. In Evolutionary Computation, 2001. Proceedings of the 2001 Congress on (Vol. 1, pp. 94–100). IEEE. 19. Eberhart, R. C., & Shi, Y., (2004). Guest editorial special issue on particle swarm optimization. IEEE Transactions on Evolutionary Computation, 8(3), 201–203. 20. Eberhart, R. C., Shi, Y., & Kennedy, J., (2001). Swarm Intelligence. Vol. 1, pp. 20-32. Elsevier. 21. Fesanghary, M., Mahdavi, M., Minary-Jolandan, M., & Alizadeh, Y., (2008). Hybridizing harmony search algorithm with sequential quadratic programming for engineering optimization problems. Computer methods in applied mechanics and engineering, 197(33–40), 3080–3091. 22. Fisher, M. L., & Jaikumar, R., (1981). A generalized assignment heuristic for vehicle routing. Networks, 11(2), 109–124. 23. Frieze, A. M., Galbiati, G., & Maffioli, F., (1982). On the worst-case performance of some algorithms for the asymmetric traveling salesman problem. Networks, 12(1), 23–39. 24. Garey, M. R., & Johnson, D. S., (2002). Computers and Intractability (Vol. 29, pp. 17-38). New York: WH freeman. 25. Gaskell, T. J., (1967). Bases for vehicle fleet scheduling. Journal of the Operational Research Society, 18(3), 281–295. 26. Geem, Z. W., (2006). Optimal cost design of water distribution networks using harmony search. Engineering Optimization, 38(3), 259–277. 27. Geem, Z. W., (2008). Novel derivative of harmony search algorithm for discrete design variables. Applied Mathematics and Computation,

74

28. 29.

30.

31.

32. 33. 34.

35. 36.

37.

38.

39.

40.

Introduction To Algorithms

199(1), 223–230. Geem, Z. W., Kim, J. H., & Loganathan, G. V., (2001). A new heuristic optimization algorithm: harmony search. Simulation, 76(2), 60–68. Geem, Z. W., Kim, J. H., & Loganathan, G. V., (2002). Harmony search optimization: application to pipe network design. International Journal of Modeling and Simulation, 22(2), 125–133. Geem, Z. W., Lee, K. S., & Park, Y., (2005). Application of harmony search to vehicle routing. American Journal of Applied Sciences, 2(12), 1552–1557. Gendreau, M., Hertz, A., & Laporte, G. (1994). A tabu search heuristic for the vehicle routing problem. Management Science, 40(10), 1276– 1290. Gillett, B. E., & Miller, L. R., (1974). A heuristic algorithm for the vehicle-dispatch problem. Operations Research, 22(2), 340–349. Gunn, S. R., (1998). Support vector machines for classification and regression. ISIS Technical Report, 14(1), 5–16. Gupta, J. N., (1971). A functional heuristic algorithm for the flow shop scheduling problem. Journal of the Operational Research Society, 22(1), 39–47. Gupta, J. N. (1972). Heuristic algorithms for multistage flow shop scheduling problem. AIIE Transactions, 4(1), 11–18. Gupta, JN (1976). A heuristic algorithm for the flowshop scheduling problem. French Journal of Automatic, Computer Science, Operations Research. Operational Research, 10(V2), 63-73. Helsgaun, K., (2000). An effective implementation of the Lin– Kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1), 106–130. Hillier, F. S., & Connors, M. M., (1966). Quadratic assignment problem algorithms and the location of indivisible facilities. Management Science, 13(1), 42–57. Ho, J. C., & Chang, Y. L. (1991). A new heuristic for the n-job, M-machine flow-shop problem. European Journal of Operational Research, 52(2), 194–202. Holland, J. H., (1992). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence (Vol. 1, pp. 23-43). MIT press.

An Introduction to Heuristic Algorithms

75

41. Hoogeveen, J. A., (1991). Analysis of Christofides’ heuristic: Some paths are more difficult than cycles. Operations Research Letters, 10(5), 291–295. 42. Hundal, T. S., & Rajgopal, J., (1988). An extension of Palmer’s heuristic for the flow shop scheduling problem. International Journal of Production Research, 26(6), 1119–1124. 43. Ignall, E., & Schrage, L., (1965). Application of the branch and bound technique to some flow-shop scheduling problems. Operations research, 13(3), 400–412. 44. Jaw, J. J., Odoni, A. R., Psaraftis, H. N., & Wilson, N. H., (1986). A heuristic algorithm for the multi-vehicle advance request dial-aride problem with time windows. Transportation Research Part B: Methodological, 20(3), 243–257. 45. Johnson, D. S., & McGeoch, L. A., (1997). The traveling salesman problem: A case study in local optimization. Local Search in Combinatorial Optimization, 1, 215–310. 46. Johnson, S. M., (1954). Optimal two- and three-stage production schedules with setup times included. Naval Research Logistics (NRL), 1(1), 61–68. 47. Kennedy, J., (2011). Particle swarm optimization. In Encyclopedia of machine learning (pp. 760–766). Springer US. 48. Kim, J. H., Geem, Z. W., & Kim, E. S., (2001). Parameter estimation of the nonlinear Muskingum model using harmony search. JAWRA Journal of the American Water Resources Association, 37(5), 1131– 1138. 49. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P., (1983). Optimization by simulated annealing. science, 220(4598), 671–680. 50. Koulamas, C., (1998). A new constructive heuristic for the flow shop scheduling problem. European Journal of Operational Research, 105(1), 66–71. 51. Kröse, B., Krose, B., van der Smagt, P., & Smagt, P., (1993). An introduction to neural networks. Vol. 8, pp. 15-9. University of Amsterdam. 52. Kusiak, A., & Heragu, S. S., (1987). The facility layout problem. European Journal of operational research, 29(3), 229–251. 53. Lawler, E. L., Lenstra, J. K., & Rinnooy Kan, A. H., (1985). The traveling salesman problem. Vol. 1, pp. 465. John Wiley & Sons

76

Introduction To Algorithms

54. Lee, K. S., & Geem, Z. W. (2004). A new structural optimization method based on the harmony search algorithm. Computers & Structures, 82(9–10), 781–798. 55. Lee, K. S., & Geem, Z. W., (2005). A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Computer methods in applied mechanics and engineering, 194(36–38), 3902–3933. 56. Lee, R. C., (1967). CORELAP-computerized relationship layout planning. Jour. Ind. Engg., 8(3), 195–200. 57. Lin, S., (1965). Computer solutions of the traveling salesman problem. The Bell system technical journal, 44(10), 2245–2269. 58. Lin, S., & Kernighan, B. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations research, 21(2), 498516. 59. Lin, S., & Kernighan, B. W., (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations research, 21(2), 498– 516. 60. Mahdavi, M., Fesanghary, M., & Damangir, E., (2007). An improved harmony search algorithm for solving optimization problems. Applied mathematics and computation, 188(2), 1567–1579. 61. Meller, R. D., & Gau, K. Y., (1996). The facility layout problem: recent and emerging trends and perspectives. Journal of manufacturing systems, 15(5), 351–366. 62. Mitchell, J. S., (1999). Guillotine subdivisions approximate polygonal subdivisions: A simple polynomial-time approximation scheme for geometric TSP, k-MST, and related problems. SIAM Journal on Computing, 28(4), 1298–1309. 63. Mole, R. H., & Jameson, S. R., (1976). A sequential route-building algorithm employing a generalized savings criterion. Journal of the Operational Research Society, 27(2), 503–511. 64. Nawaz, M., Enscore Jr, E. E., & Ham, I., (1983). A heuristic algorithm for the m-machine, n-job flow-shop sequencing problem. Omega, 11(1), 91–95. 65. Nugent, C. E., Vollmann, T. E., & Ruml, J. (1968). An experimental comparison of techniques for the assignment of facilities to locations. Operations research, 16(1), 150–173. 66. Omran, M. G., & Mahdavi, M. (2008). Global-best harmony search.

An Introduction to Heuristic Algorithms

67.

68. 69.

70.

71. 72.

73.

74. 75. 76.

77.

78. 79.

80.

77

Applied mathematics and computation, 198(2), 643–656. Osman, I. H., (1993). Metastrategy simulated annealing and tabu search algorithms for the vehicle routing problem. Annals of operations research, 41(4), 421–451. Osman, I. H., & Potts, C. N., (1989). Simulated annealing for permutation flow-shop scheduling. Omega, 17(6), 551–557. Page, E. S., (1961). An approach to the scheduling of jobs on machines. Journal of the Royal Statistical Society. Series B (Methodological), 484–492. Palmer, D. S., (1965). Sequencing jobs through a multi-stage process in the minimum total time—a quick method of obtaining a near optimum. Journal of the Operational Research Society, 16(1), 101–107. Papadimitriou, C. H., (1977). The Euclidean traveling salesman problem is NP-complete. Theoretical computer science, 4(3), 237–244. Papadimitriou, C. H., & Yannakakis, M., (1993). The traveling salesman problem with distances one and two. Mathematics of Operations Research, 18(1), 1–11. Pham, D., & Karaboga, D., (2012). Intelligent optimization techniques: genetic algorithms, tabu search, simulated annealing and neural networks. Springer Science & Business Media. Reeves, C. R., (1995). A genetic algorithm for flow shop sequencing. Computers & operations research, 22(1), 5–13. Reinelt, G., (1991). TSPLIB—A traveling salesman problem library. ORSA Journal on Computing, 3(4), 376–384. Rosenkrantz, D. J., Stearns, R. E., & Lewis, P. M., (2009). An analysis of several heuristics for the traveling salesman problem. In Fundamental Problems in Computing (pp. 45–69). Springer, Dordrecht. Ruiz, R., & Maroto, C., (2005). A comprehensive review and evaluation of permutation flow shop heuristics. European Journal of Operational Research, 165(2), 479–494. Seehof, J. M., & Evans, W. O., (1967). Automated layout design program. Journal of Industrial Engineering, 18(12), 690–695. Taillard, E. (1990). Some efficient heuristic methods for the flow shop sequencing problem. European journal of Operational research, 47(1), 65–74. Taillard, E., (1993). Benchmarks for basic scheduling problems.

78

81. 82. 83. 84.

85.

86.

87.

88.

Introduction To Algorithms

European journal of operational research, 64(2), 278–285. Taillard, É., (1993). Parallel iterative search methods for vehicle routing problems. Networks, 23(8), 661–673. Tate∗, D. M., & Smith, A. E., (1995). Unequal-area facility layout by genetic search. IIE Transactions, 27(4), 465–472. Vapnik, V., (2013). The nature of statistical learning theory. Springer science & business media. Widmer, M., & Hertz, A., (1989). A new heuristic method for the flow shop sequencing problem. European Journal of Operational Research, 41(2), 186–193. Wren, A., & Holliday, A., (1972). Computer scheduling of vehicles from one or more depots to a number of delivery points. Journal of the Operational Research Society, 23(3), 333–344. Wu, X., Sharif, B. S., & Hinton, O. R., (2005). An improved resource allocation scheme for plane cover multiple access using genetic algorithm. IEEE Transactions on Evolutionary Computation, 9(1), 74–81. Xiang, Z., Zhang, Q., Zhu, W., & Zhang, Z., (2003, July). Replication strategies for peer-to-peer based multimedia distribution service. In Multimedia and Expo, 2003. ICME’03. Proceedings. 2003 International Conference on (Vol. 2, pp. II–153). IEEE. Xiang, Z., Zhang, Q., Zhu, W., Zhang, Z., & Zhang, Y. Q., (2004). Peer-to-peer based multimedia distribution service. IEEE Transactions on Multimedia, 6(2), 343–355.

CHAPTER

4

TYPES OF MACHINE LEARNING ALGORITHMS

CONTENTS 4.1. Introduction....................................................................................... 80 4.2. Supervised Learning Approach........................................................... 81 4.3. Unsupervised Learning...................................................................... 84 4.4. Algorithm Types................................................................................. 86 References.............................................................................................. 113

Introduction To Algorithms

80

4.1. INTRODUCTION Computational learning theory is the branch of statistics in which the computational analysis and the performance of the machine learning algorithms are studied. In machine learning, such kinds of algorithms are developed that helps the computer to learn. Learning does not essentially involve awareness, finding the statistical symmetries or some patterns in the given data is also learning. Machine learning algorithms hardly resemble a learning task approached by humans. However, in different environments, the machine learning algorithms can provide insight into the comparative difficulty of learning. Based on the preferred result of the algorithm the learning algorithms are ordered into different categories. The common learning algorithm types are: i.

Supervised learning: Here a function is generated by the algorithm which maps the inputs to anticipated outputs. The classification problem is one of the standard formulations of this learning. In classification problem the student is required to absorb, to estimate the performance of a function. Out of several classes, the function plots a vector into one by considering several inputoutput specimens of the function. ii. Unsupervised learning: A set of inputs is modeled by unsupervised learning. The labeled examples are not available. iii. Semi-supervised learning: This learning combines the labeled and the unlabeled examples to produce a suitable function or classifier. iv. Reinforcement learning: Given the surveillance of world here the algorithm acquires a strategy of how to perform. Every action has a certain effect on the environment, of which the feedback is provided by the environment. The feedback then escorts the learning algorithm. v. Transduction: This is just like supervised learning, but here a function is not created explicitly. The Transduction learning tries to forecast new outcomes based on the trained inputs- outputs and the new inputs. vi. Learning to learn: It is based on the earlier experience in this type the algorithm acquires its own inductive preference.

Types of Machine Learning Algorithms

81

4.2. SUPERVISED LEARNING APPROACH In the classification problems supervised learning is quite common because of the goal, which is getting a computer to study a classification system created. Digit recognition is an example of this learning. Classification learning is suitable for the problems where the classification is determined easily and assuming a classification is helpful. In some of the cases, if the mediator itself can solve out the classification then it is not necessary to provide pre-determined classifications to each occasion of a problem. In a classification framework, this is the example of an unsupervised learning. In supervised learning, there is a probability that the inputs are often left undefined. If the inputs are offered then this model is not needed. It is not feasible to predict anything about the outcomes if some values of the input are omitted. In Unsupervised learning it is supposed that all of the observations are initiated by a latent variable, it is expected that the observations will be at the end of the causal chain. The figure below shows the examples of both supervised and unsupervised learning (Figure 4.1).

Figure 4.1: Illustration of supervised learning and unsupervised learning systems. (Source: https://www.intechopen.com/books/new-advances-in-machinelearning/types-of-machine-learning-algorithms).

For the training of the neural networks and the decision trees, the best common technique is supervised learning. Both, the neural networks and the decision trees rely extremely on the information provided by pre-determined classifications. In neural networks, errors are determined by the classification which then amends the network so that the errors are minimized. Whereas in decision trees, the attributes which give the maximum information that helps in solving the classification mystery are determined by the classification. Both of these techniques will be discussed in more detail later. It is adequate

82

Introduction To Algorithms

for knowledge that these examples prosper on having “supervision” in the shape of pre-determined classifications. Inductive machine learning is that process in which the set of rules are learned from examples (cases in the training set), or more commonly speaking, generating a classifier which helps in generalizing from the new examples. The following sections describe the process in which the supervised ML is applied to the real-world problem. The first step involves collecting the dataset. If an essential expert is accessible, then she/he could advise which attributes and features are the most useful. If the expert is not available, then the method which can be used is “brute-force algorithm,” where everything available is measured in faith that the appropriate features can be made inaccessible. The dataset collected by “brute-force algorithm” is not appropriate for induction. In most cases, it contains the required according to Zhang (2002). The preparation and pre-processing of the data is the second step. Liable on the conditions, numbers of methods are available for the researchers to handle the missing data (Batista, 2003). Hodge (2004) introduced a review of modern-day techniques for noise detection. The advantages and disadvantages of these techniques are also enlightened by these researchers. Instance selection is used to handle the noise as well as to cope with the difficulty of learning from large datasets. The optimization problem in these datasets is the instance selection which while reducing the sample size tries to preserve the mining quality. It decreases the data and implements the data mining algorithm for functioning effectively with the large datasets. A variety of processes are available for sampling the instances from the large dataset (Allix, 2000; Mooney, 2000). The process of detecting and eliminating as many inappropriate and dismissed features as possible is known as the Feature subset selection (López, 2001; López et al., 2002; Yu, 2004). The dimensionality of data is decreased by this process and data mining algorithms are able to function faster and more efficiently. The point that numerous features are dependent on one another excessively affects the precision of supervised Machine language classification models. By creating innovative features from the simple feature set the solution to this problem can be found. This method is called the feature construction/transformation. The newly produced features might help to create more brief and precise classifiers. The discovery of the important features adds to the better clarity of the created classifier, and an improved understanding of the concept. Markov models and the Bayesian

Types of Machine Learning Algorithms

83

networks implemented in speech recognition depends on few elements of supervision, in order to amend the parameters so that on given inputs the errors can be minimized (Mostow, 1983; Fu & Lee, 2005; Ghahramani, 2008). The important thing to be observed here: the goal of the learning process in classification problem is to reduce the error in accordance with the particular inputs. These inputs, known as “training set,” are the specimens who help the agent to learn. But learning these inputs is not necessarily needed. For example, if I attempt to demonstrate the exclusive-or and display the combinations comprising of one true and the other false (Getoor & Taskar, 2007; Rebentrost et al., 2014). The combination of both false or both of them true is never displayed; here one might memorize the rule according to which the answer is true at all times. Likewise, with the machine learning algorithms, the problem of over-fitting the data and fundamentally remembering the training sets instead of adopting more broad classification technique is common (Rosenblatt, 1958; Durbin & Rumelhart, 1989). The inputs of each of the training sets are not classified correctly. This can cause problems if the implemented algorithm is dominant enough to remember “special cases” which are not suitable for the general principles (Figure 4.2). This can result in overfitting. It is quite challenging to discover algorithms that are powerful enough to absorb complex functions and vigorous enough to yield generalizable results (Hopfield, 1982; Timothy Jason Shepard, 1998).

Figure 4.2: Schematic illustration of Machine Learning Supervise Procedure. (Source: https://www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

84

Introduction To Algorithms

4.3. UNSUPERVISED LEARNING This method appears much harder: the goal of this method is to make the computer learn to perform some task and the procedure of that task is not guided by us. There are two major methods for unsupervised learning. The first method involves teaching the agent not by providing the clear categorizations, but by implementing some kind of reward system which indicates the success. This type of teaching will be normally suitable for the decision problem frame because the goal is to make decisions which will increase the rewards, not to yield a classification. This method agreeably applies to the real world where the agents may be compensated for performing certain actions and rebuked for doing other actions (LeCun et al., 1989; Bilmes, 1998; Alpaydm, 1999). A form of supporting learning can be implemented for unsupervised learning, where the actions of the agent are based on the former rewards and penalties without necessarily learning the information about the precise ways in which it affects the real world. This information is pointless in a way that by getting used to the reward function, the agent is well aware of the action to perform without any processing. For each action, the exact reward to be achieved is well known by the agent. In cases where the calculation of each possibility is extremely timeconsuming, this can prove to be extremely beneficial. On the other side, it can be extremely time taking to learn by trial and error. But this type of learning can prove to be powerful because in this type of learning classification of examples which are already discovered are not assumed. Our classifications are not always the best possible in some cases (Rumelhart et al., 1985; 1986; Schwenker & Trentin, 2014). For example, when the computer programs which learned through the unsupervised learning became robust than the finest human chess players, the predictable wisdom in the game (backgammon) was spun on its head. These programs learned certain principles which shocked the backgammon specialists and performed far better than the backgammon programs taught on the pre-classified examples. Clustering is another type of unsupervised learning. In this kind of learning the objective is to find the resemblances in training data, not to maximize the utility function (Xu & Jordan, 1996; Mitra et al., 2008). Now here the supposition is that the clusters found will match sensibly well with the intuitive classification. For example, the clustering individuals which are based on demographics may outcome in the clustering of the wealthy in one set and of the poor in the other set. Though the algorithm won’t assign names to these clusters it can create them and then by using these

Types of Machine Learning Algorithms

85

clusters can allocate new examples into one of the clusters (Herlihy, 1998; Gregan‐Paxton et al., 2005). Described above is the data-driven approach which performs well when sufficient data is available for example social information sifting algorithms, used by Amazon.com to recommend books (Stewart & Brown, 2004; Sakamoto et al., 2008). These algorithms are based on the principle of discovering analogous sets of people and allocating new consumers to groups. In the case of social information filtering information about the other fellows of a cluster is adequate for the algorithm to create meaningful results. In the rest of the cases, the clusters are simply a beneficial tool for an expert analyst. Unfortunately, the unsupervised learning also faces this problem of overfitting. There’s no shortcut to avoid this problem of overfitting because the algorithm which can adapt the changes from its inputs must be powerful enough (Nilsson, 1982; Vancouver, 1996; Sanchez, 1997). The unsupervised learning algorithms are made to dig out the structure from the data samples. The cost function is used to measure the worth of a structure which is typically minimized to deduce the most favorable parameters illustrating the concealed structure in data. Consistent and vigorous inference need an assurance that the structures extracted, are usual for the source for example from the second model set of the similar data source the similar structures are extracted (Bateson, 1960; Campbell, 1976; Kandjani et al., 2013). From the literature of statistics and machine learning the deficiency of robustness is called overfitting. Now here overfitting phenomenon is characterized by the group of histogram clustering dummies which play an important role in the retrieval of information, linguistic, and applications based on computer vision. Learning algorithms with the capability to model fluctuations are resultant of the large deviation outcomes and are the utmost entropy principle for the process of learning (Pollack, 1989; Pickering, 2002; Turnbull, 2002). Many successes have been produced by unsupervised learning, such as: i. World champion caliber backgammon program. ii. Machines skilled of driving cars. Unsupervised learning can be an influential method when there exists an easy way to allocate values to the actions. Clustering is beneficial when there is sufficient data to produce the clusters and particularly when extra data about the adherents of a cluster is used to create the further outcomes due to the dependencies in data (Acuna & Rodriguez, 2004; Farhangfar et al., 2008). Classification learning is useful when it is known that the

Introduction To Algorithms

86

classifications are correct or are merely random things that the computer is able to recognize. Classification learning is frequently essential in situations where decision compiled by the algorithm is needed as input in some other situation. Both Clustering and Classification learning are valuable and choosing the right technique depends upon the following circumstances (Dutton & Starbuck, 1971; Lakshminarayan et al., 1999). i. ii. iii.

Kind of problem being solved. Time allotted for solving it. Either supervised learning is possible.

4.4. ALGORITHM TYPES In the extent of supervised learning where which mostly classification is dealt with, the following are the types of algorithms: i. Linear Classifiers; ii. Naïve Bayes Classifier; iii. Logical Regression; iv. Support Vector Machine; v. Quadratic Classifiers; vi. Perceptron; vii. Boosting; viii. Decision Tree; ix. K-Means Clustering; x. Neural networks; xi. Random Forest; and xii. Bayesian Networks.

4.4.1. Linear Classifiers In the process of machine learning, classification is used to group objects with comparable feature values, into the groups. Timothy et al. (1998) declared that the linear classifiers accomplish this with the help of a classification decision (Grzymala-Busse & Hu, 2000; Grzymala-Busse et al., 2005). The classification decision is dependent on the input of the linear combination. If the real vector is the input to the classifier, then the output is given as:

Types of Machine Learning Algorithms

87

    y= f ( w ⋅ x )= f  ∑ w j x j  ,  j   Where, w → real weights vector, f→ function in which the dot product of two vectors is translated into the preferred output. A set of marked training  samples helps to deduce the vector w . Usually, the values above a specific threshold are mapped to a first class and the rest of the values to second class by the function f. The probability that a certain item fits in a certain class is given by the complex function f (Li et al., 2004; Honghai et al., 2005; Luengo et al., 2012). In the two-class classification, the functionality of the linear classifier can be visualized as dividing with hyperplane the high-dimensional input space. The points on the one side of hyperplane are marked as “yes,” while the points on the other side are marked as “no.” Linear classifiers are frequently used in conditions where the pace of classification is a problem because linear classifiers are the fastest classifiers especially when the real vector is sparse. The decision trees are also capable to be faster (Hornik et al., 1989; Schaffalitzky & Zisserman, 2004). Linear classifiers usually  perform very well when in the real vector x the number of dimensions is large. In document classification, every element in the real vector is usually the counts of a certain word in the document. The classifier must be wellregularized in these cases (Dempster et al., 1997; Haykin & Network, 2004). As indicated by (Luis et al) support vector machine executes classification by creating the N-dimensional hyperplane which optimally splits the data into categories. Support vector machine models are closely connected to neural networks. The Support vector machine model, in which the sigmoid kernel function is used resembles the two-layer (PNN) perceptron neural network (Rosenblatt, 1961; Dutra da Silva et al., 2011). SVM models are closely related to the traditional multilayer (PNN) perceptron neural networks. Using a kernel function, Support vector machine models are the substitute training scheme for the polynomials, radial basis functions and multi-layer perceptron neural networks classifiers. In SVM the weight of the network is found by resolving the quadratic programming problem having linear constraints, instead of resolving a non-convex and unconstrained minimization problem such as used in the standard neural network teaching (Gross et al., 1969; Zoltan-Csaba et al., 2011; Olga et al., 2015).

88

Introduction To Algorithms

In the language of support vector machine literature, the attribute is a predictor variable, and an altered attribute which defines the (HP) hyperplane is known as a feature. The process of selecting the most appropriate representation is called the feature selection. Set of features which describes one of the cases is known as a vector. The goal of SVM (support vector machine) modeling is to discover the optimum hyperplane which splits the groups of vector in a way that cases with one category of the target variable are on one side of the plane and cases with the other category are on the other side of the plane (Block et al., 1962; Novikoff, 1963; Freund & Schapire, 1999). The vectors closest to the hyperplane are known as the support vectors. An overview of the support vector machine process is presented in the following section (Figure 4.3).

4.4.2. A Two-Dimensional Case Let us discuss an easy 2-dimensional example before viewing N-dimensional hyperplanes. Suppose that we want to execute a classification, and the data available to us has a definite target variable with two categories. Also, assume that two predictor variables having continuous values are also available. If the data points are plotted using the values of the predictor on the X axis and the Y axis, then the image shown below might be the outcome. One category is represented by the rectangles and the ovals represent the other category (Lula, 2000; Otair & Salameh, 2004; Minsky & Papert, 2017).

Figure 4.3: The demonstration of SVM analysis for determining 1D hyperplane (i.e., line) which differentiates the cases because of their target categories. (Source: https://www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

In this example, in the lower left corner are the cases of one category and in the upper right corner are the cases of another category. Both of the cases are totally separated from each other. The Support Vector Machine analysis tries to find the one-dimensional hyperplane which separates both cases on

Types of Machine Learning Algorithms

89

the basis of their target categories. An endless number of potential lines are possible; shown above are the two candidate lines. Now here the question arises that which line is much better, and how the optimum line is defined (Caudill & Butler, 1993; Hastie et al., 2009; Noguchi & Nagasawa, 2014). The dotted lines which are parallel to separating line spot the distance amongst the dividing line and nearest vectors to line. The margin is the distance between the dotted lines. Support vectors are the points which limit the size of the margin. It is illustrated in the following Figure 4.4. The Support Vector Machine analysis discovers the line or, in general, the hyperplane which is oriented to maximize the margin among the support vectors (Luis Gonz, 2005; Hall et al., 2009).

Figure 4.4: Illustration of an SVM Analyses containing dual-category target variables possessing two predictor variables having the likelihood of the division for point clusters. (Source: https://www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

The figure above shows that the line in right section is better than the line in the other section If all the analyses comprised of two category target variables along with the two predictor variables and the straight line could separate the group of points, then life would be stress-free (Neymark et al., 1970; Hammes & Wieland, 2012). Regrettably, this is not usually the case, so the Support Vector Machine must deal with: a. b. c.

Handling of the cases where more than the two predictor variables distinct the points with the non-linear curves. Handling of the cases where groups cannot be separated completely. Handling of the classifications having more than the two

Introduction To Algorithms

90

categories. Three major machine learning techniques are explained in this chapter with examples and how these techniques perform in reality. These are: i. ii. iii.

K-Means Clustering; Neural Network; and Self-Organized Map;

4.4.3. K-Means Clustering This technique comprises of the uncomplicated steps. In the beginning K (number of the cluster) is determined and the center of the clusters is assumed. Any random objects are taken as the primary center or the first K objects in the structure can also function as the primary center (Teather, 2006; Yusupov, 2007). Then three steps mentioned below are performed by the K means algorithm until convergence. Iterate until stable. i. Determining the center coordinate. ii. Determining the distance of every object from the center. iii. Grouping the objects based on the minimum distance. K- Means flowchart is shown in the figure below (Figure 4.5)

Figure 4.5: Schematic illustration of K-means iteration. (Source: https://www. intechopen.com/books/new-advances-in-machine-learning/types-of-machinelearning-algorithms).

K-means clustering is an easiest unsupervised learning algorithm which provides the solution of the well-known clustering problem. The process

Types of Machine Learning Algorithms

91

adopts a simple way to categorize the given data set amongst the definite number of clusters. For each cluster, K centroid is to be defined. These centroids must be positioned in a scheming way because different results can be obtained from a different location (Franc & Hlaváč, 2005; González et al., 2006). The more appropriate choice is to position the centroids as far as possible from each other. The next step involves taking each point from the given data and associating the point with the adjacent centroid. When there is no point left, it indicates the completion of the first step and an early grouping is done (Oltean & Dumitrescu, 2004; Fukunaga, 2008; Chali, 2009). Now here re-calculation of ‘k’ new centroids is to be done. After obtaining the new centroids, a fresh binding amongst the same points and the adjacent new centroid is to be done. A loop is produced. Due to this loop, the k centroids adjust their position gradually until there are no more changes. Thus the centroids don’t move anymore (Oltean & Groşan, 2003; Oltean, 2007). This algorithm goal at diminishing an objective function, a squared error function in this particular case. The objective function:

= J

k

x

∑∑ || x j −1 i −1

i

2

j

− c j ||

,

|| xij − c j ||2 is a selected distance measure between (data point) and cj (cluster center). It is a sign of a distance of n data points with respect to their particular cluster centers. The algorithm for this technique consists of the steps defined below: i. K points are placed into the space denoted by objects which are being clustered. These points denote primary group centroids. ii. Each object is assigned to the group which has the nearest centroid. iii. The positions of K centroids are recalculated after all of the objects have been allocated. iv. Steps ii and iii are repeated as long as the centroids are moving. This creates a partition of objects into the groups from where the minimized metric can be calculated. Though it can be ascertained that the process will always terminate, this k-means clustering algorithm does not essentially find the most optimum configuration equivalent to the minimum global objective function (Burke et al., 2006; Bader-El-Den & Poli, 2007). The K-means algorithm is also considerably delicate to the primary arbitrarily selected cluster centers. This

Introduction To Algorithms

92

algorithm is run numerous times to diminish this effect. K-means algorithm has been modified to many problem fields. This algorithm is a worthy contender for modification to work with the fuzzy feature vectors (Tavares et al., 2004; Keller & Poli, 2007). Assume that n-sample feature vectors (x1, x2, x3 ..., xn) are available, all of the similar class and they plunge into the k dense clusters where k < n. Assume mi to be the mean in the cluster i. We can implement the minimum distance classifier to split the clusters if they are well separated. It can be said that the x is present in the cluster I only if || x – mi || represents the minimum of all k distances. This recommends the following process in order to find the k means: Supposing the values for means (m1, m2, m3 ..., mk) unless no modifications in any of the mean occur. ii. Classifying the samples into the clusters using the estimated means. iii. For the cluster i from the range 1 to k. iv. Swap mi by, mean of all the samples for the cluster i. v. end_for. vi. end_until. i.

Figure 4.6: Demonstration of the motion for m1 and m2 means at the midpoint of two clusters. (Source: https://www.intechopen.com/books/new-advances-inmachine-learning/types-of-machine-learning-algorithms).

This is an easy form of the k-means process. It can be observed as the greedy algorithm for separating the n models into k clusters to reduce the totality of squared distances to cluster centers. Some weaknesses of this algorithm do exist:

Types of Machine Learning Algorithms

i. ii.

iii.

iv.

93

The method to set the means wasn’t stated. One famous way to start the mean is to arbitrarily choose k of the models. The outcomes yielded are dependent on the primary values for the mean, and it commonly happens that the suboptimum partitions are originated. The standard solution for this is to go for a number of dissimilar starting points. It is possible that a set of models nearby to mi is vacant so that the mi can’t be upgraded. This is an irritation which must be looked after in an implementation. The outcomes are dependent on the metric which is used to quantify || x – mi ||. The standard way out is to standardize every variable by its typical deviation, although this is not desirable every time.

4.4.4. Neural Network Neural networks can, in fact, execute a number of reversion and classification jobs at once, though normally each network executes only one (Forsyth, 1990; Bishop C. M., 1995; Poli et al., 2007). In the majority of cases, there is only one particular output variable for the network. For many state classification problems, this may relate to the number of outcome units. If you outline a network with numerous output variables, then it might suffer from cross-talk. The concealed neurons have difficulty in learning as they are trying to model as a minimum two functions. The best possible solution is that for each output, train the separate network and then combine them into one so that they can perform as a unit. Neural network methodologies are explained below:

4.4.4.1. Multilayer Perceptron Today this is the most accepted network architecture, discovered by (Rumelhart & McClelland, 1986) and explained in detail in most of the neural network textbooks. This kind of network is discussed briefly in the earlier sections: the units each of them execute the weighted sum of the inputs and with the help of a transfer function go through this activation level to produce the output. The units are set in a covered feed-forward topology. The multilayer perceptron has a simple explanation in the shape of the input-output model along with the weights. Thus this network biases the free variables of the model. Such networks can be used to form the functions of random complexity with the help of the number of layers and number of

Introduction To Algorithms

94

the units in every layer, thus determining the complexity of the function. One of the important issues in MLP (Multilayer Perceptrons) design includes the specification of hidden layers and the number of units present in these hidden layers (Bishop, 1995; Michie, 1968; Anevski et al., 2013). The number of the input-output units is interpreted by the problem. Ambiguity about exactly which inputs to utilize may exist, a point which will be discussed later. For the moment let us suppose that the inputs are selected instinctively and are significant. The number of concealed units to utilize is very clear. Making use of the one hidden layer can be a good starting point, with a number of units equivalent to half of the sum of the number of the input-output units. Choosing a reasonable number is described later (Nilsson, 1982; Orlitsky et al., 2006).

4.4.4.2. Training Multilayer Perceptron As the number of layers and the number of units within every layer are decided, the weights and the thresholds of the network must be set in order to diminish the prediction error produced by the network. The training algorithms perform this task. The historical cases are implemented to regulate the weights and the thresholds automatically so that this error can be minimized (Mohammad & Mahmoud, 2014). This procedure is the same as the fitting of the model characterized by the network to the available training data. The error of the specific configuration can be concluded by running all of the training cases throughout the network, evaluating the real output yielded with the desired outputs (Ayodele, 2010; Singh & Lal, 2013). The differences are pooled together with the help of an error function so that the network error can be produced. The most familiar error functions are: i.

Sum squared error which is used for the regression problems. In sum squared error the discrete errors of the output units are squared and then summed together. ii. Cross-entropy functions which are used for the maximum likelihood classification. In traditional modeling methodologies like for example linear modeling, it is probably to determine the configuration of the model algorithmically in order to minimize this error absolutely. The compromise made for the non-linear modeling power of the neural networks is that we under no circumstances be certain that the error couldn’t be minimized anymore (Jayaraman et al., 2010; Punitha et al., 2014). Error surface is the useful concept to be considered here. Every single of

Types of Machine Learning Algorithms

95

N weights and starting points of the network is engaged to be the dimension in space. The N+1th dimension is usually the network error. For any potential configuration of the weights, the error can be intrigued in the N+1th dimension thus creating an error surface. Finding the lowermost point in the many-dimensional surface is the main purpose of the network training. In the linear model having the sum squared error function, the error surface is a parabola. This means that the error surface is a bowl-shape with only a single minimum. It is thus easy to pinpoint the minimum (Kim & Park, 2009; Chen et al., 2011; Anuradha & Velmurugan, 2014). The error surfaces in the case of neural networks are more complex and are usually characterized by the number of unsupportive features such as: i. Local Minima; ii. Flat-spot and plateau; iii. Saddle-points; and iv. Long narrow ravine. Thus it is not likely to determine analytically where the overall minimum of error surface lies, so the training of neural network is an investigation of error surface. From the primarily arbitrary configuration of the weights and thresholds, the training algorithms try to find the global minimum step by step. Usually, the gradient of error surface is considered at the recent point and is used to make the downhill move. Ultimately the algorithm halts at the low point which might be the local minimum or the global minimum (Fisher, 1987; Lebowitz, 1987; Gennari et al., 1988).

4.4.4.3. Back Propagation Algorithm The best example of the neural network training is back propagation algorithm (Haykin, 1994; Patterson et al., 2007; Fausett, 1994). The recent second-order algorithms, for example, conjugate gradient descent and the Liebenberg-Marquardt (Bishop, 1995; Bishop et al., 1996; Shepherd, 1997) are considerably faster to be utilized in many problems. In some situations, the Backpropagation algorithm still has many advantages and can prove to be the easiest algorithm to comprehend. We will only introduce this algorithm; more advanced algorithms will be discussed later. In the backpropagation algorithm, the slope vector is calculated. The slope vector from the current point directs to the line with the sharpest descent. Now if we move a small distance along the slope vector the error will be decreased. A series of such moves will ultimately find the minimum of some kind. The tough portion is

96

Introduction To Algorithms

to choose how large the step should be (Michalski & Stepp, 1982; Stepp & Michalski, 1986). Large steps may meet quickly but these steps may surpass the solution or go down the wrong path. In a neural network, the example of this is the case where the algorithm proceeds very gradually along a steep and narrow valley, rebounding from one side through to the other. As compared to large steps, small steps may proceed in the right direction but these steps need a number of iterations. The step size is relative to the slope and the learning rate. The precise setting for the special constant learning rate is always application dependent and is normally chosen by the experiment. The learning rate might also be time-varying and gets smaller with the growth of algorithm (Mamassian et al., 2002). The algorithm is commonly improved by the insertion of the momentum term. This boosts the movement in a particular direction, thus if a number of steps are moved in that particular direction the algorithm gains speed. This speed gives algorithm the ability to escape local minimum sometimes and to move quickly over the flat spot and plateau (Weiss & Fleet, 2002; Purves & Lotto, 2003). The algorithm thus proceeds iteratively, across a number of times. On every single epoch, the cases succumb in an opportunity to the network and the target. Actual outputs are then compared and the error is calculated. This error in conjunction with error surface gradient is utilized to modify the weights, after which the process repeats. The primary network configuration is arbitrary and the training discontinues when a particular number of epochs pass by or the error makes it to a tolerable level. When the error halts to improve the training is also stopped (Yang & Purves, 2004; Rao, 2005; Doya, 2007).

4.4.4.4. Over-Learning and Generalization One of the major problems with the method defined above is that the backpropagation algorithm does not really diminish the error in which we are actually interested, which is the predictable error that the network will produce when the new cases succumb to it. The most necessary property of the network is the ability to simplify new cases. Actually, the network is skilled to diminish the error and is deficient in having a flawless and enormously large training set. This isn’t similar to the diminishing of error on real error surface which is the error of the fundamental and unfamiliar model (Dockens, 1979; Bishop, 1995; Abraham, 2004).

Types of Machine Learning Algorithms

97

The most significant demonstration of this difference is the over-learning problem. It is quite simple to validate this concept by means of a polynomial curve fitting instead of neural networks. An equation comprising only of constants and power of the variables is known as a polynomial. For example: y = 3 x 2 + 4x + 1 Different polynomials always have different shapes. The polynomials with the larger powers have gradually more eccentric shapes. For a given set of data, in order to describe the data, we may need to fit the polynomial curve. The data is possibly noisy, so we do not essentially anticipate the best possible model to pass precisely through all of the points (Long, 1980; Konopka, 2006; Shmailov, 2016). The low order polynomial might not be adequately supple to fit nearby the points. Comparatively the high order polynomial is essentially too supple, fitting the data precisely by implementing the highly eccentric shape which is actually not linked to the fundamental function (Seising & Tabacchi, 2013; Shmailov, 2016) (Figure 4.7).

Figure 4.7: Graph showing a polynomial sample with a high order. (Source: https://www.intechopen.com/books/new-advances-in-machine-learning/typesof-machine-learning-algorithms).

The neural networks have specifically the same problem. The one having additional weights simulates the more complex function and is thus susceptible to over-fitting. Whereas, the one having less weight might not be powerful enough to model the fundamental function. For example, the network having no hidden layers essentially models a more simple linear function. How can we then choose the precise complexity of network? The

98

Introduction To Algorithms

larger network will nearly always attain the lower error ultimately, but this may specify over-fitting instead of good modeling (Somorjai et al., 2003; Zanibbi & Blostein, 2012). The answer to the above question is to check the development contrary to an autonomous data set. Several cases are reserved and not really utilized for the training in the backpropagation algorithm. Instead, they are utilized to keep an autonomous check on the improvement of the algorithm (Lukasiak et al., 2007; Puterman, 2014). It is always the situation that the primary execution of the network on the training and independent selection sets is similar. As the training proceeds the error drops naturally and provided that the training is diminishing the error function, thus the selection error also drops (Ahangi et al., 2013; Aldape-Pérez et al., 2015). On the other hand, if there is a possibility that the selection error halts dropping or certainly starts to increase then this specifies that the particular network is beginning to overfit the given data and thus the training should stop. When the overfitting takes place in the course of training process as explained above, it is known as over-learning. In this situation, it is typically worthwhile to reduce the number of hidden layers because the network is powerful enough for the problem. Instead, if the network isn’t adequately powerful enough to model the fundamental function then over-learning isn’t expected to happen and neither the training nor the selection errors will fall to an acceptable level (Goodacre et al., 2007; Brereton, 2015). The problems related to the local minima and the conclusions on the size of the network to utilize suggest that utilizing a neural network typically involves experimenting with a large number of dissimilar networks. Training each of the networks multiple times and observing the performance of each network individually (Sutton, 1988; Sutton & Barto, 1998). Selection error is the main guide for better performance here. However, obeying the standard scientific principle which states that all another being equivalent, an easy and simple model is preferred at all times as compared to the complex model. A smaller network can be selected in preference to the larger network with the insignificant improvement in the selection error (Ambikairajah et al., 1993; Russell & Norvig, 2016). The difficulty with this method of repetitive research is that the main part in choosing the model is played by the selection set, which concludes that it is essentially a step of the training procedure. Its dependability as an autonomous guide of model’s performance is thus compromised. With the help of adequate experiments, one might discover a network that performs

Types of Machine Learning Algorithms

99

satisfactorily. To be sure of the final model’s performance, it is the usual exercise to retain the third set of the cases known as the test set. The ultimate model is then tested with test set data to confirm that the outcomes of the training and selection set are real and thus are not the items of the training procedure. In order to justify this role appropriately the test set must be used once, if it is used to modify and recapitulate the training procedure then it successfully becomes the selection data (Hattori, 1992; Forseth et al., 1995). This partition into many subsets is very unlucky because we commonly have a lesser amount of data than we would have desired ideally even for the single subset. We can obtain the solution to this problem by resampling. Using different partitions of the data available for the training, selection, and test set experiments can be organized (Gong & Haton, 1992; Wagh, 1994). Numbers of methods to this subset are: i. Random resampling; ii. Cross-validation; and iii. Bootstrap. If we make the decisions related to design grounded in experiments with the different subset examples, such as to use the best possible configuration of the neural network then the outcomes will be more trustworthy. We can use those experiments exclusively to: i.

Lead the decision as to use which of the network types and teach such networks from the very scratch with the help of new samples. ii. Preserve the best possible networks originated during the process of sampling. To summarize, a network design obeys the following steps: i. ii.

Selection of an initial configuration. Iteratively conducting a number of tests with each configuration and then retaining the best network found. If the training finds a local minimum then a number of tests are necessary with each configuration in order to evade being fooled. It is useful to resample. iii. On each test, if under-learning befalls then try adding extra neurons to the hidden layer. If this isn’t useful, try adding one more hidden layer. iv. If over-learning befalls then try removing the hidden units. v. Once an efficient configuration for the networks is determined,

100

Introduction To Algorithms

resample, and produce new networks with the help of that particular configuration.

4.4.4.5. Selection of Data All the above steps are based on a key supposition. Precisely, the training and the test data should be illustrative of the fundamental model. An old computer science proverb “garbage in, garbage out” couldn’t relate more appropriately than in the neural modeling. If the training data isn’t illustrative, then the worth of the model is compromised. The worst thing could be that it may be of no use. It is quite useful indicating the type of problems that can alter the training set. The training data is usually historical. If the conditions have altered, relationships which were held in the past could no longer exist. All the possibilities must be dealt with. The cases which are present can only help the neural network to learn. If the people with earnings over $80,000 annually are at bad credit threat, and the training data contains nobody over $40,000 annually then one cannot expect that neural network will make the right decision when it comes across one of the earlier undetected cases. Extrapolation is hazardous with any type, but some categories of the neural network might particularly make poor calculations in such conditions. A network absorbs the easiest characteristics efficiently. A classic demonstration of this fact is a project made to recognize tanks automatically. A network is educated on a hundred pictures with tanks on them and a hundred pictures which don’t. It accomplishes the flawless 100% score. When tested on the new data, the results are hopeless. The reason behind this is that the pictures with tanks on them are taken on dark and rainy days whereas the network absorbs to differentiate the differences between the pictures in complete light. To make this network work it would require training cases which includes all the weather and lighting situation under which the network is expected to function. Since a network reduces the error, the percentage of the forms of data is critical in the set. A network skilled on a set of data which includes 900 good 100 bad cases will bias its result towards good cases because this will allow the algorithm to reduce the overall error. If the depiction of the above cases in real population is different than the decision made by the network may be wrong. Diagnosing the disease is a good example for this. Imagine 90% of the patients usually tested are free of any disease. On the available data set, a network is skilled with a spilled of 90/10. This network is then

Types of Machine Learning Algorithms

101

utilized on patients complaining of particular problems where the possibility of the presence of any disease is 50/50. The network will respond carefully and be unsuccessful to identify the disease in some cases. In comparison, if the network is skilled on “complainants” data and then experimented on “routine” data, a large number of made-up positives may be raised by the network. In such conditions, the data set might be created to take an account of the circulation of data or the decisions made by the network be altered by the addition of loss matrix (Bishop, 1995). Frequently the best tactic is to guarantee the equal illustration of different cases and then to elucidate the decisions accordingly.

4.4.5. Self-Organized Map Self-Organizing Feature Map (SOFM) networks are utilized in a different way as compared to other networks. Other networks are planned for the supervised learning tasks whereas the SOFM networks are designed mainly for the unsupervised learning tasks (Haykin, 1994; Patterson et al., 2007; Fausett, 1994). Training data set in supervised learning comprises of the cases containing input variables organized with the related outputs whereas the training set in unsupervised learning consists of input variables only. This may appear weird at the first glance. What can a network learn without the outputs? The answer to this question is the SOFM network tries to absorb the configuration of the data. Kohonen (1997) explained that one possible use of SOFM is in the exploratory data examination. The SOFM network can pick up to identify the data clusters and can relate classes which are alike. The consumer of the network can figure out the understanding of data, which is then utilized to improve the network. As the classes of the data are acknowledged, they can then be labeled in order to make the network capable of the classification tasks. The SOFM networks can be used for classification when the output classes are available instantaneously. The benefit in this situation is their capability to highlight the resemblances between the classes. A second potential use of this network is in the novelty detection. The SOFM networks can acquire to identify the clusters in training data and then respond. If the new data come upon, the network is unsuccessful to identify it and this specifies novelty. The SOFM network consists of the following two layers: i. ii.

The input layer. An output layer with radial units (topological map layer). In

Introduction To Algorithms

102

topological map layer, the units are positioned in space i.e., in the two dimensions. With the help, an iterative algorithm the SOFM networks are skilled (Patterson et al., 2007). Beginning with the random set of the radial centers, the algorithm adjusts them progressively to reveal the clustering of the data. This algorithm, at one stage, relates with sub-sampling and the K-means algorithms which are used to allocate centers in the self-organize map (SOM) network. The SOFM algorithm can be utilized to allocate centers for these kinds of networks. The iterative training technique also organizes the network in a way that the units demonstrating centers close to each other in the input layer are also located close to each other on the topological layer. The topological layer of the network can be thought of as a crude 2-dimensional lattice, which must be crumpled and slanted into the N-dimensional input layer to preserve the original structure in the best possible way. Any effort to characterize the N-dimensional space into the two dimensions will outcome in the loss of detail. This method can be useful in letting the user think about the data which otherwise might be difficult to understand. The basic iterative SOFM algorithm goes through a number of periods, on every single epoch implementing the training case and employing the following algorithm: i.

Selection of the winning neuron. The neuron whose center is closest to the input situation is known as the winning neuron. ii. Adjusting the winning neuron so that it can resemble the input case. The iterative algorithm utilizes the time decaying learning rate, implemented to accomplish the weighted sum. It also confirms that the modifications become more delicate as the period’s pass. This guarantees that the centers snuggle down to a cooperative demonstration of the cases that helps the particular neuron to win. By adding the thought of neighborhood to the iterative algorithm the topological assembling property is accomplished. The set of neurons adjacent to the winning neuron is called a neighborhood. The neighborhood similar to the learning rate deteriorates over time. Initially, a big number of neurons are located in the neighborhood whereas with the passage of time the neighborhood will be empty. In the Kohonen or SOFM algorithm, the modification of neurons is practiced on all the fellows of the current neighborhood. The consequence of this neighborhood apprises is that primarily big areas of a network are pulled towards the training cases

Types of Machine Learning Algorithms

103

and are dragged considerably. The network cultivates a crude topological arrangement with the similar cases triggering the clusters of neurons inside the topological map. Both the learning rate and the neighborhood decreases with the passage of time so that the better differences within the areas of the map can be made, eventually resulting in the fine-tuning of an individual neuron. Very often the training is intentionally organized in two separate phases: i.

A comparatively short phase with usually the high learning rates and neighborhood. ii. A long phase having the low learning rates and zero neighborhoods. Once the network is able to identify configuration in the data, the network can be utilized as a visualization device to inspect the data. With the help of this network Win, Frequencies Datasheet can be observed to see if the discrete clusters have been made on the map. The Individual cases are implemented and topological map examined, to observe if some significance can be allotted to clusters. Once the clusters are recognized, the neurons present in the topological chart are labeled to specify their meaning. When the topological chart has been plotted in this way, then new circumstances can be given to network. The network can execute arrangement on the condition that winning neuron must be tagged with the class name. If not, then the network is considered as undecided. SOFM networks, when executing classification also utilizes the accept threshold. The triggering extent of the neuron in the SOFM network is its distance from the input, the accept threshold plays the role of the maximum documented distance. If the initiation of the winning neuron is much larger as compared to this distance then the SOFM network is considered as undecided. By tagging all the neurons and allocating the accept threshold suitably, a SOFM network can perform as the novelty detector. SOFM network as articulated by Kohonen (1997) are motivated by some of the known characteristics of the brain. The cerebral cortex is really a large smooth sheet with identified topological properties.

4.4.5.1. Grouping Data Via Self -Organized Map The first part of the SOM is the data. Given below are few examples of three-dimensional data normally used when testing with the SOM. Now here the colors are exemplified in three dimensions (green, blue, and red.) The aim of the SOMs is to transform the N-dimensional data into something that would be understood visually in a better way (Figure 4.8).

104

Introduction To Algorithms

Figure 4.8: Illustration of sample data. (Source: https://www.intechopen.com/ books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

In this situation, one would assume the ink blue and the grey color to end up close to each other on the map and the yellow color near to both red and green. The second part of the Self Organize Maps is the weight vector. Every vector consists of two components which are an illustration in the discussion below. The first component of the weight vector is the data and the second component is its natural position. The best thing about the color is that by displaying color the data of the weight vector can be presented. In this situation, the color represents the data, and the (x, y) place of the pixel represents the location. In this particular example, a 2D array of the weight vectors was utilized and would resemble the figure above. This image is a tilted view of the grid where the n-dimensional array for every weight prevails and every weight vector has its own exclusive position in the grid. The weight vectors do not essentially have to be settled in two dimensions. A large amount of work has already been done using the SOMs having only one dimension. The data fragment of the weight vector must have the same dimensions as compared to the sample vectors. The weights are occasionally mentioned to as the neurons since the SOMs are essentially the neural networks. The way that the SOMs organize themselves is usually by challenging for the demonstration of samples. The neurons are also permitted to modify themselves by adapting to develop more like the samples in expectations of winning the subsequent competition. This selection and learning procedure makes the weights to shape themselves into the map representing resemblances (Figure 4.9).

Types of Machine Learning Algorithms

105

Figure 4.9: The depiction of the 2D array weight of a vector. (Source: https:// www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

So with the sample and the weight vectors, how are the weight vectors ordered to represent the resemblances of sample vectors? This is achieved by implementing the simple algorithm presented below (Figure 4.10).

Figure 4.10: Illustration of a sample SOM algorithm. (Source: https://www. intechopen.com/books/new-advances-in-machine-learning/types-of-machinelearning-algorithms).

The first step involved in developing the SOM is the initialization of the weight vectors. After initialization, a sample vector can be selected randomly and then the map of the weight vectors can be searched to find out the weight which to a greater extent represents the particular sample. Each weight vector has its particular location and the neighboring weights

106

Introduction To Algorithms

which are near to it. The chosen weight is compensated by being capable of becoming more like the arbitrarily selected sample vector. The neighbors of that particular weight are also compensated by being capable of becoming more like the selected sample vector. After this step t is increased to a small amount because of a number of neighbors and the extent to which each weight can acquire decreases over with time. The whole procedure is then repeated about more than a thousand times. In the condition of colors the processes involved in implementing the simple algorithm are:

4.4.5.2. Initializing of the Weights Given below are the screenshots of three different methods with the help of which the weight vector map is initialized. The six intensities of blue, red, and green are displayed in java program below. The real values are usually floating for the weights, thus the weights have a bigger array than the 6 values displayed in the figure below.

Figure 4.11: Demonstration of weight values. (Source: https://www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

To initialize the weight vector a number of methods are available. The first method is to give the arbitrary values to each weight vector for its data.

Types of Machine Learning Algorithms

107

Displayed above on left is the screen of pixels having arbitrary blue, red, and green values. Calculating the SOMs according to Kohonen is very expensive computationally, thus some alternatives of initializing the weight exist so that the samples are not similar. In this method, less number of iterations is needed to yield a worthy map and is time-saving. In addition to the arbitrary method, two other methods are developed to initialize the weight. One method involves putting red, black, green, and blue color at all the four corners and then allowing them to fade slowly towards the center. The other method involves having blue, red, and green color equally distant from each other and also from the center.

4.4.5.3. Attainment of Best Matching Unit This step is very simple, go through all of the weights and compute the distance of the chosen sample vector from each weight. The weight having the shortest distance among all is the winner. If more than one weight vectors have the same distance, then among the weight vectors having the shortest distance the winning weight vector is selected arbitrarily. A number of diverse ways are available for defining what distance means mathematically. Euclidean distance is the most usual method used: n

∑x i =0

2 i

where, x[i]→data value at ith data fellow of the sample. n→ number of dimensions for the sample vectors. For colors, if we consider them as the 3D points and every part being an axis. Now if we have selected the green color with value (0, 6, 0) the light green color with value (3, 6, 3) will be much closer to our selected green color than red with value (6, 0, 0). Light Green = Sqrt((3–0)^2+(6–6)^2+(3–0)^2) = 4.24 Red = Sqrt((6–0)^2+(0–6)^2+(0–0)^2) = 8.49 So the light green color is the best possible matching unit. This process of computing the distances and associating them is executed over the complete map. The weight vector having the shortest distance is the winner. For speed optimization of this section, the square root isn’t calculated in the java program.

108

Introduction To Algorithms

4.4.5.4. Scale Neighbors There are essentially two parts involved in the scaling of the neighboring weights. The first part involves determining which of the weights are believed to be the neighbors and to what extent every weight vector can become similar to the sample vector. The neighbors of the winning weight vector can be decided with the help of different methods. Some of the methods use concentric squares while others utilize hexagons. One of the methods using the Gaussian function is discussed where every single point with the value beyond zero is thought off as a neighbor. As stated above in this chapter, the quantity of neighbors reduces over time. This is made to happen so that the samples can travel to that zone where they will possibly be and then they compete for the position. This procedure is like the course modification followed by the fine-tuning. The function implemented to reduce the radius of the effect doesn’t really matter provided that it decreases.

Figure 4.12: A graph demonstrating the determination of SOM neighbor. (Source: https://www.intechopen.com/books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

The Figure 4.12 shows the plot of the function used. With the passage of time, the base progresses towards the center, thus there are fewer neighbors as the time progresses. The preliminary radius is kept actually high which is some value close to the width or to the height of the map. The learning function is the second part involved in the scaling of neighbors. The winning weight vector is compensated with becoming similar to the sample vector. The neighbors also become similar to the

Types of Machine Learning Algorithms

109

sample vector. The characteristic of this learning method is, the distant the neighbor is from winning weight, the less it will learn. The proportion at which the weight can learn reduces and can be set to any desired range. The Gaussian function used will return the value in between 0 and 1, where every single neighbor is then altered utilizing the parametric equation. The new color will then be given as: Current color*(1.-t) + sample vector*t As the quantity of neighbors of the weight is decreased, the amount that the weight vector can learn is also decreased with the passage of time. On the very first iteration, the winning weight vector is converted into the sample vector as t has the full range from 0 to 1. Then with the passage of time, the winning weight vector to some extent becomes similar to the sample vector where the highest value of t reduces. The pace at which the weight vector can learn drops off linearly. To illustrate this, in the preceding plot, the amount which a weight vector can learn is equal to how elevated the bump is. With the passage of time, the height of bump will reduce. Once the weight is defined as the winner, the neighbors of that particular weight vector are found and every individual neighbor as well as the winning weight vectors changes to become similar to the sample vector.

4.4.5.5. Determination of SOMs’ Quality Another example of a SOM produced by the program implementing 500 iterations is given below. In the figure, you will observe that the similar color is grouped altogether again. Though, this isn’t the case at all times because it can be seen that some colors are surrounded by the colors which are not like them. It is easy to point out with the help of colors because we are familiar with them. If more abstract data is used, then it is very difficult to decide that two entities are similar since they are near to each other. A very simple technique is available for exhibiting where the similarities prevail and where they don’t. In order to calculate this, we have to go through all of the weights and then conclude how alike the neighbors are with each other. This is completed by computing the distance which the weight vectors travel between each weight and its neighbors. An average of the computed distances helps in assigning the color to that particular location. This process is placed in Screen.java with the name of public void update_bw() (Figure 4.13).

110

Introduction To Algorithms

Figure 4.13: Display of SOM Iterations. (Source: https://www.intechopen.com/ books/new-advances-in-machine-learning/types-of-machine-learning-algorithms).

If the calculated average distance is very large, then the adjacent weights are different and dark color is allotted to that particular location of weight. However, if the computed average distance is small then lighter color is allotted. The color is the same as the center of blobs, so it must be white as the same color is present in the adjacent area. In the areas between the blobs where similarities are present, it shouldn’t be white. It must be a light grey color. There must be black color in the areas where blobs are substantially adjacent to each other but are not alike (Figure 4.14).

Figure 4.14: A sample of weight allocation in Colors. (Source: https://www. intechopen.com/books/new-advances-in-machine-learning/types-of-machinelearning-algorithms).

Types of Machine Learning Algorithms

111

In the image shown above, the gorges of the black color display where the color might be substantially close to one another on the map. When we talk about the real values of weights then the colors are different from each other. The areas where a light grey color is present between the blobs, it represents an exact similarity. In the image shown above, black color is present in the right bottom which is surrounded by the colors which are not comparable to it. When observing the black and white similar SOM, it demonstrates that the black color isn’t comparable to the other color. The lines of black color represent no resemblance between the two colors. In the uppermost corner, a pink color is present and adjacent to it is light green in color. In reality, both of the colors aren’t close to each other, but the colored SOM shows them close to each other. With the average distances utilized to produce black and white SOM, we can essentially allocate each SOM a value which will determine how best the image has represented the likenesses of the samples.

4.4.5.6. Pros and Cons of Self-Organized Map (SOM) i.

The best thing about the SOM is that SOMs are quite selfexplanatory. If the SOMs are very close to each other and a grey is linking them, then SOMs are quite similar. If a black ravine exists between them, then SOMs are very different. Unlike Multidimensional Scaling, people can rapidly learn how to use SOMS in an efficient way. ii. Another advantage of SOMs is their efficient working. SOMs can classify the data very well and are easily evaluated for their quality. This can actually help to calculate the efficiency of a map and compute the similarities between the objects. Some of the Cons of SOMs are: i.

ii.

One major problem related to SOMs is getting the correct data. Unfortunately, in order to produce a map, you require a value for every single dimension of each of the member of samples. Sometimes this is just not probable and usually, it is very tiresome to obtain the data. This is the limiting characteristic of SOM usually referred to as the missing data. Each of the SOMs is different and discovers unlike similarities between the sample vectors. SOMs classify the sample data so that when the final product is ready, the samples are typically enclosed by the similar samples. However similar samples aren’t

Introduction To Algorithms

112

iii.

always close to each other. A large number of maps must be created in order to obtain one good map. The final problem associated with SOMs is the computational expense which is the main disadvantage. As the dimensions of data boost up, dimension reduction visualization methods become more significant and the time to calculate the dimensions also increases.

Types of Machine Learning Algorithms

113

REFERENCES 1.

Abraham, T. H., (2004). Nicolas Rashevsky’s mathematical biophysics. Journal of the History of Biology, 37(2), 333–385. 2. Acuna, E., & Rodriguez, C., (2004). The treatment of missing values and its effect on classifier accuracy. Classification, clustering, and data mining applications (pp. 639–647). Springer, Berlin, Heidelberg. 3. Ahangi, A., Karamnejad, M., Mohammadi, N., Ebrahimpour, R., & Bagheri, N., (2013). Multiple classifier systems for EEG signal classification with application to brain-computer interfaces. Neural Computing and Applications, 23(5), 1319–1327. 4. Aldape-Pérez, M., Yáñez-Márquez, C., Camacho-Nieto, O., LópezYáñez, I., & Argüelles-Cruz, A. J., (2015). Collaborative learning based on associative models: Application to pattern classification in medical datasets. Computers in human behavior, 51, 771–779. 5. Allix, N. M., (2000). The theory of multiple intelligences: A case of missing cognitive matter. Australian Journal of Education, 44(3), 272– 288. 6. Allix, N. M., (2003, April). Epistemology And Knowledge Management Concepts And Practices. Journal of Knowledge Management Practice . 7. Alpaydin, E., (2004). Introduction to Machine Learning. Massachusetts, USA: MIT Press. 8. Alpaydm, E., (1999). Combined 5× 2 cv F test for comparing supervised classification learning algorithms. Neural Computation, 11(8), 1885– 1892. 9. Ambikairajah, E., Keane, M., Kelly, A., Kilmartin, L., & Tattersall, G., (1993). Predictive models for speaker verification. Speech communication, 13(3–4), 417–425. 10. Anevski, D., Gill, R. D., & Zohren, S., (2013). Estimating a probability mass function with unknown labels. arXiv preprint arXiv:1312.1200. 11. Anil Mathur, G. P., (1999). Socialization influences on preparation for later life. Journal of Marketing Practice: Applied Marketing Science, 5 (6,7,8), 163–176. 12. Anuradha, C., & Velmurugan, T., (2014, December). A data mining based survey on the student performance evaluation system. In Computational Intelligence and Computing Research (ICCIC), 2014 IEEE International Conference on (pp. 1–4). IEEE.

114

Introduction To Algorithms

13. Ashby, W. R., (1960). Design of a Brain, The Origin of Adaptive Behaviour. John Wiley and Son. 14. Ayodele, T. O., (2010). Machine learning overview. In New Advances in Machine Learning (Vol. 1, pp. 19-40). InTech. 15. Bader-El-Den, M., & Poli, R., (2007, October). Generating SAT localsearch heuristics using a GP hyper-heuristic framework. In International Conference on Artificial Evolution (Evolution Artificielle) (pp. 37–49). Springer, Berlin, Heidelberg. 16. Bateson, G., (1960). Minimal requirements for a theory of schizophrenia. AMA Archives of general psychiatry, 2(5), 477–491. 17. Batista, G., (2003). An Analysis of Four Missing Data Treatment Methods for Supervised Learning. Applied Artificial Intelligence, 17, 519–533. 18. Bilmes, J. A., (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. International Computer Science Institute, 4(510), 126. 19. Bishop, C. M., (1995). Neural Networks for Pattern Recognition (Vol. 1, pp. 1-19). Oxford, England: Oxford University Press. 20. Bishop, C. M., Tang, M., Cannon, R. M., & Carter, W. C. (2006). Continuum modelling and representations of interfaces and their transitions in materials. Materials Science and Engineering: A, 422(12), 102-114. 21. Block H, D., (1961). The Perceptron: A Model of Brain Functioning. 34 (1), 123–135. 22. Block, H. D., Knight Jr, B. W., & Rosenblatt, F., (1962). Analysis of a four-layer series-coupled perceptron. II. Reviews of Modern Physics, 34(1), 135. 23. Brereton, R. G., (2015). Pattern recognition in chemometrics. Chemometrics and Intelligent Laboratory Systems, 149, 90–96. 24. Burke, E. K., Hyde, M. R., & Kendall, G., (2006). Evolving bin packing heuristics with genetic programming. In Parallel Problem Solving from Nature-PPSN IX (pp. 860–869). Springer, Berlin, Heidelberg. 25. Campbell, D. T., (1976). On the conflicts between biological and social evolution and between psychology and moral tradition. Zygon®, 11(3), 167–208.

Types of Machine Learning Algorithms

115

26. Carling, P. A. (1992). The nature of the fluid boundary layer and the selection of parameters for benthic ecology. Freshwater Biology, 28(2), 273-284. 27. Caudill, M., & Butler, C., (1993). Understanding Neural Networks: Computer Explorations (No. 006.3 C3). 28. Chali, Y. S. R., (2009). Complex Question Answering: Unsupervised Learning Approaches and Experiments. Journal of Artificial Intelligent Research, 1–47. 29. Chen, Y. S., Qin, Y. S., Xiang, Y. G., Zhong, J. X., & Jiao, X. L., (2011). Intrusion detection system based on immune algorithm and support vector machine in a wireless sensor network. In Information and Automation (pp. 372–376). Springer, Berlin, Heidelberg. 30. Michie, D. (1968). “Memo” functions and machine learning. Nature, 218(5136), 19-22. 31. Dempster, A. P., Laird, N. M., & Rubin, D. B., (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society. Series B (methodological), 1–38. 32. Dockens, W. S., (1979). Induction/catastrophe theory: A behavioral ecological approach to cognition in human individuals. Systems Research and Behavioral Science, 24(2), 94–111. 33. Doya, K. (Ed.)., (2007). Bayesian brain: Probabilistic approaches to neural coding. MIT press. 34. Durbin, R., & Rumelhart, D. E., (1989). Product units: A computationally powerful and biologically plausible extension to backpropagation networks. Neural Computation, 1(1), 133–142. 35. Dutra da Silva, R., Robson, W., & Pedrini Schwartz, H., (2011). Image segmentation based on wavelet feature descriptor and dimensionality reduction applied to remote sensing. Chilean J. Stat, 2. 36. Dutton, J. M., & Starbuck, W. H., (1971). Computer simulation models of human behavior: A history of an intellectual technology. IEEE Transactions on Systems, Man, and Cybernetics, (2), 128–171. 37. Elliott, S. W., & Anderson, J. R. (1995). Effect of memory decay on predictions from changing categories. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(4), 815. 38. Farhangfar, A., Kurgan, L., & Dy, J., (2008). Impact of the imputation of missing values on classification error for discrete data. Pattern Recognition, 41(12), 3692–3705.

116

Introduction To Algorithms

39. Fausett, L. V., & Elwasif, W. (1994, June). Predicting performance from test scores using backpropagation and counterpropagation. In  Neural Networks, 1994. IEEE World Congress on Computational Intelligence., 1994 IEEE International Conference on (Vol. 5, pp. 3398-3402). IEEE. 40. Fisher, D. H., (1987). Knowledge acquisition via incremental conceptual clustering. Machine learning, 2(2), 139–172. 41. Forsyth, M. E., Hochberg, M., Cook, G., Renals, S., Robinson, T., Schechtman, R., & Doherty-Sneddon, G., (1995). Semi-continuous hidden Markov models for speaker verification. In Proc. ARPA Spoken Language Technology Workshop (Vol. 1, pp. 2171–2174). University of Twente, Enschede. 42. Forsyth, R. S., (1990). The strange story of the Perceptron. Artificial Intelligence Review, 4 (2), 147–155. 43. Franc, V., & Hlaváč, V., (2005, August). Simple solvers for large quadratic programming tasks. In Joint Pattern Recognition Symposium (pp. 75–84). Springer, Berlin, Heidelberg. 44. Freund, Y., & Schapire, R. E., (1999). Large margin classification using the perceptron algorithm. Machine learning, 37(3), 277–296. 45. Friedberg, R. M. (1958). A learning machine: Part I. IBM Journal of Research and Development, 2(1), 2-13. 46. Fu, S. S., & Lee, M. K., (2005). IT-Based Knowledge Sharing and Organizational Trust: The Development and Initial Test of a Comprehensive Model. ECIS 2005 Proceedings, 56, 1-8. 47. Fukunaga, A. S., (2008). Automated discovery of local search heuristics for satisfiability testing. Evolutionary Computation, 16(1), 31–61. 48. Gennari, J. H., Langley, P., & Fisher, D., (1988). Models of incremental concept formation (No. UCI-ICS-TR-88–16). California University Irvine Department of Information and Computer Science. 49. Getoor, L., & Taskar, B. (Eds.)., (2007). Introduction to statistical relational learning. MIT press. 50. Ghahramani, Z., (2008). Unsupervised learning algorithms are designed to extract structure from data. 178, pp. 1–8. IOS Press. 51. GILLIES, D. (1994). A Rapprochement Between Deductive and Inductive Logic. Interest Group in Pure and Applied Logics. Logic Journal, 2(2), 149-149. 52. Gong, Y., & Haton, J. P., (1992). Non {linear vectorial interpolation

Types of Machine Learning Algorithms

53.

54.

55.

56.

57.

58.

59.

60.

61.

62.

117

for speaker recognition,” in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2,(San Francisco, California, USA), pp. II173 {II176, March 1992. González, L., Angulo, C., Velasco, F., & Catala, A., (2006). Dual unification of bi-class support vector machine formulations. Pattern recognition, 39(7), 1325–1332. Goodacre, R., Broadhurst, D., Smilde, A. K., Kristal, B. S., Baker, J. D., Beger, R., ... & Ebbels, T., (2007). Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics, 3(3), 231– 241. Gregan-Paxton, J., Hoeffler, S., & Zhao, M., (2005). When categorization is ambiguous: Factors that facilitate the use of a multiple category inference strategy. Journal of Consumer Psychology, 15(2), 127–140. Gross, G. N., Lømo, T., & Sveen, O., (1969). Participation of inhibitory and excitatory interneurons in the control of hippocampal-cortical output, Per Anderson, The Interneuron. Grzymala-Busse, J. W., & Hu, M., (2000, October). A comparison of several approaches to missing attribute values in data mining. In International Conference on Rough Sets and Current Trends in Computing (pp. 378–385). Springer, Berlin, Heidelberg. Grzymala-Busse, J. W., Goodwin, L. K., Grzymala-Busse, W. J., & Zheng, X., (2005, August). Handling missing attribute values in preterm birth data sets. In International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (pp. 342–351). Springer, Berlin, Heidelberg. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H., (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10–18. Hammes, M. & Wieland, R., (2012). A screening tool to stress while working. In G. Athanassiou, S. Schreiber-Costa & O. Sträter (ed.), Psychology of Occupational Safety and Health – Successful Design of Safe and Good Work-Research and Implementation in Practice (pp. 331–334). Kröning: Asanger. Hastie, T., Tibshirani, R., & Friedman, J., (2009). Unsupervised learning. The elements of statistical learning (pp. 485–585). Springer, New York, NY. Hattori, H., (1992). Text {independent speaker recognition using

118

63. 64. 65. 66.

67. 68.

69.

70.

71.

72. 73.

74.

Introduction To Algorithms

neural networks,” in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2,(San Francisco, California, USA), pp. II153 {II156. Haykin, S., (1994). Neural Networks: A Comprehensive Foundation. New York: Macmillan Publishing. Haykin, S., & Network, N., (2004). A comprehensive foundation. Neural networks, 2(2004), 41. Herlihy, B. (1998). Targeting 50+ mining the wealth of an established generation. Direct Marketing, 61(7), 18–20. Higgins, A., Bahler, L., & Porter, J. (1996). Voice identification using nonparametric density matching. Automatic Speech and Speaker Recognition, 355, 211–233. Hodge, V. A., (2004). A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22 (2), 85–126. Holland, J., (1980). Adaptive Algorithms for Discovering and Using General Patterns in Growing Knowledge Bases Policy Analysis and Information Systems. 4 (3), 245-266. Honghai, F., Guoshun, C., Cheng, Y., Bingru, Y., & Yumei, C., (2005, September). A SVM regression-based approach to filling in missing values. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (pp. 581–587). Springer, Berlin, Heidelberg. Hopfield, J. J., (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 2554–2558. Hornik, K., Stinchcombe, M., & White, H., (1989). Multilayer feedforward networks are universal approximators. Neural networks, 2(5), 359–366. Hornik, K., Buchta, C., & Zeileis, A. (2009). Open-source machine learning: R meets Weka. Computational Statistics, 24(2), 225-232. Jaime G., & Carbonell, R. S., (1983). Machine Learning: A Historical and Methodological Analysis. Association for the Advancement of Artificial Intelligence, 4 (3), 1–10. Jayaraman, P. P., Zaslavsky, A., & Delsing, J., (2010, June). Intelligent processing of k-nearest neighbors queries using mobile data collectors in a location-aware 3D wireless sensor network. In International Conference on Industrial, Engineering and Other Applications of

Types of Machine Learning Algorithms

75.

76.

77.

78. 79. 80.

81. 82.

83.

84.

85.

119

Applied Intelligent Systems (pp. 260–270). Springer, Berlin, Heidelberg. Kandjani, H., Bernus, P., & Nielsen, S., (2013, January). Enterprise architecture cybernetics and the edge of chaos: Sustaining enterprises as complex systems in complex business environments. In System Sciences (HICSS), 2013 46th Hawaii International Conference on (pp. 3858–3867). IEEE. Keller, R. E., & Poli, R., (2007, September). Linear genetic programming of parsimonious metaheuristics. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on (pp. 4508–4515). IEEE. Kim, M. H., & Park, M. G., (2009). Bayesian statistical modeling of system energy saving effectiveness for MAC protocols of wireless sensor networks. In Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (pp. 233–245). Springer, Berlin, Heidelberg. Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(13), 1-6. Konopka, A. K., (2006). Systems biology: principles, methods, and concepts. CRC Press. Lakshminarayan, K., Harp, S. A., & Samad, T., (1999). Imputation of missing data in industrial databases. Applied Intelligence, 11(3), 259– 275. Lebowitz, M., (1987). Experiments with incremental concept formation: Unimem. Machine learning, 2(2), 103–138. LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D., (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. Li, D., Deogun, J., Spaulding, W., & Shuart, B., (2004, June). Towards missing data imputation: a study of fuzzy k-means clustering method. In International Conference on Rough Sets and Current Trends in Computing (pp. 573–579). Springer, Berlin, Heidelberg. Long, G. E., (1980). Surface approximation: a deterministic approach to modeling spatially variable systems. Ecological Modelling, 8, 333– 343. López, J. E. N., Castro, G. M., Saez, P. L, & Muiña, F. E. G., (2002). An integrated model of creation and transformation of knowledge [CD-ROM] In Memory of the XXII Symposium on the Management of Technological Innovation, 2002, November 6–8. of the Nucleus of

120

86.

87.

88. 89.

90. 91. 92. 93. 94. 95.

96. 97.

98.

Introduction To Algorithms

Policy and Technological Management, University of São Paulo. López, P. V., (2001). The information society in Latin America and the Caribbean: ICTs and a new institutional framework [CD-Rom]. Report of the 9th Latin-Ibero-American Seminar on Technological Management Innovation in the Knowledge Economy. Luengo, J., García, S., & Herrera, F., (2012). On the choice of the best imputation methods for missing values considering three groups of classification methods. Knowledge and Information Systems, 32(1), 77–108. Luis Gonz, L. A., (2005). Unified dual for bi-class SVM approaches. Pattern Recognition , 38(10), 1772–1774. Lukasiak, B. M., Zomer, S., Brereton, R. G., Faria, R., & Duncan, J. C., (2007). Pattern recognition and feature selection for the discrimination between grades of commercial plastics. Chemometrics and Intelligent Laboratory Systems, 87(1), 18–25. Lula, P., (2000). Selected applications of artificial neural networks using STATISTICA Neural Networks. StatSoft, Krakow, Poland. Mamassian, P., Landy, M., & Maloney, L. T., (2002). Bayesian modeling of visual perception. Probabilistic Models of the Brain, 13–36. McCulloch, W. S., (1943). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophysics, 115–133. Michalski, R. S., & Stepp, R., (1982). Revealing conceptual structure in data by inductive inference. Minsky, M., & Papert, S. A., (2017). Perceptrons: an Introduction to Computational Geometry. MIT Press. Mitchell, T. M., (2006). The Discipline of Machine Learning. Machine Learning Department technical report CMU-ML-06–108, Carnegie Mellon University. Mitra, S., Datta, S., Perkins, T., & Michailidis, G., (2008). Introduction to Machine Learning and Bioinformatics. CRC Press. Mohammad, T. Z., & Mahmoud, A. M., (2014). Clustering of Slow Learners Behavior for Discovery of Optimal Patterns of Learning. LITERATURES, 5(11). Mooney, R. J., (2000). Learning Language in Logic. In L. N. Science, Learning for Semantic Interpretation: Scaling Up without Dumbing Down (pp. 219–234). Springer Berlin/Heidelberg.

Types of Machine Learning Algorithms

121

99. Mostow, D. J. (1982). Transforming declarative advice into effective procedures: A heuristic search example. Machine Learning: An Artificial Intelligence Approach, 1, 10-17. 100. Neymark, Y., Batalova, Z., & Obraztsova, N., (1970). Pattern Recognition and Computer Systems. Engineering Cybernetics, 8, 97. 101. Nilsson, N. J., (1982). Principles of Artificial Intelligence (Symbolic Computation/Artificial Intelligence). Springer. 102. Noguchi, S., & Nagasawa, K., (2014). New concepts, that is, the information processing capacity and a percent information processing capacity are introduced. Methodologies of Pattern Recognition, 437. 103. Novikoff, A. B., (1963). On convergence proofs for perceptions. Stanford Research Inst Menlo Park Ca. 104. Olga, R., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., & Berg, A. C., (2015). Large scale visual recognition challenge. ImageNet http:// arxiv. org/abs/1409.0575. 105. Oltean, M., (2005). Evolving Evolutionary Algorithms Using Linear Genetic Programming. 13 (3), 387–410. 106. Oltean, M., (2007). Evolving evolutionary algorithms with patterns. Soft Computing, 11(6), 503–518. 107. Oltean, M., & Dumitrescu, D., (2004, June). Evolving TSP heuristics using multi-expression programming. In International Conference on Computational Science (pp. 670–673). Springer, Berlin, Heidelberg. 108. Oltean, M., & Groşan, C., (2003, September). Evolving evolutionary algorithms using multi-expression programming. In European Conference on Artificial Life (pp. 651–658). Springer, Berlin, Heidelberg. 109. Orlitsky, A., Santhanam, N., Viswanathan, K., & Zhang, J., (2005). Convergence of profile based estimators. Proceedings of International Symposium on Information Theory. Proceedings. International Symposium on, pp. 1843 – 1847. Adelaide, Australia: IEEE. 110. Orlitsky, A., Santhanam, N., Viswanathan, K., & Zhang, J., (2006, March). Theoretical and experimental results on modeling low probabilities. In Information Theory Workshop, 2006. ITW’06 Punta del Este. IEEE (pp. 242–246). IEEE. 111. Otair, M. A., & Salameh, W. A., (2004). Improved Backpropagation Neural Networks using a Modified Nonlinear Function. In Proceedings of the IASTED International Conference (pp. 442–447).

122

Introduction To Algorithms

112. Patterson, K., Nestor, P. J., & Rogers, T. T. (2007). Where do you know what you know? The representation of semantic knowledge in the human brain. Nature Reviews Neuroscience, 8(12), 976. 113. Pickering, A., (2002). Cybernetics and the mangle: Ashby, Beer, and Pask. Social studies of science, 32(3), 413–437. 114. Poli, R., Woodward, J., & Burke, E. K., (2007, September). A histogrammatching approach to the evolution of bin-packing strategies. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on (pp. 3500–3507). IEEE. 115. Pollack, J. B., (1989). Connectionism: Past, present, and future. Artificial Intelligence Review, 3(1), 3–20. 116. Punitha, S. C., Thangaiah, P. R. J., & Punithavalli, M., (2014). Performance analysis of clustering using partitioning and hierarchical clustering techniques. International Journal of Database Theory and Application, 7(6), 233–240. 117. Purves, D., & Lotto, R. B., (2003). Why We See What We Do: An Empirical Theory of Vision. Sinauer Associates. 118. Puterman, M. L., (2014). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons. 119. R. S. Michalski, T. J., (1983). Learning from Observation: Conceptual Clustering. TIOGA Publishing Co. 120. Rajesh P. N. Rao, B. A., (2002). Probabilistic Models of the Brain. MIT Press. 121. Rao, R. P., (2005). Bayesian inference and attentional modulation in the visual cortex. Neuroreport, 16(16), 1843–1848. 122. Rashevsky, N., (1948). Mathematical Biophysics: PhysicoMathematical Foundations of Biology. Chicago: Univ. of Chicago Press. 123. Rebentrost, P., Mohseni, M., & Lloyd, S., (2014). Quantum support vector machine for big data classification. Physical Review Letters, 113(13), 130503. 124. Rebentrost, P., Serban, I., Schulte-Herbrüggen, T., & Wilhelm, F. K. (2009). Optimal control of a qubit coupled to a non-Markovian environment. Physical review letters, 102(9), 090401. 125. Rebentrost, P., & Wilhelm, F. K. (2009). Optimal control of a leaking qubit. Physical Review B, 79(6), 060507.

Types of Machine Learning Algorithms

123

126. Ripley, B., (1996). Pattern Recognition and Neural Networks. Cambridge University Press. 127. Rosenblatt, F., (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65 (6), 386–408. 128. Rosenblatt, F., (1961). Principles of Neurodynamics Unclassified— Armed Services Technical Information Agency. Spartan, Washington, DC. 129. Rumelhart, D. E., Hinton, G. E., & Williams, R. J., (1985). Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science. 130. Rumelhart, D. E., Hinton, G. E., & Williams, R. J., (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533. 131. Russell, S. J., (2003). Artificial Intelligence: A Modern Approach (2nd Edition ed.). Upper Saddle River, NJ, NJ, USA: Prentice Hall. 132. Russell, S. J., & Norvig, P., (2016). Artificial Intelligence: A Modern Approach. Malaysia; Pearson Education Limited. 133. Ryszard S. & Michalski, J. G., (1955). Machine Learning: An Artificial Intelligence Approach (Volume I). Morgan Kaufmann. 134. Ryszard S. Michalski, J. G., (1955). Machine Learning: An Artificial Intelligence Approach, vol. 3, pp. 15-35.. 135. Sakamoto, Y., Jones, M., & Love, B. C., (2008). Putting the psychology back into psychological models: Mechanistic versus rational approaches. Memory & Cognition, 36(6), 1057–1065. 136. Sanchez, R., (1997). Strategic management at the point of inflection: Systems, complexity and competence theory. Long Range Planning, 30(6), 939–946. 137. Schaffalitzky, F., & Zisserman, A., (2004). Automated scene matching in movies. CIVR 2004. Proceedings of the Challenge of Image and Video Retrieval, London, LNCS, 2383. 138. Schwenker, F., & Trentin, E., (2014). Pattern classification and clustering: A review of partially supervised learning approaches. Pattern Recognition Letters, 37, 4–14. 139. Seising, R., & Tabacchi, M. E., (2013, June). A very brief history of soft computing: Fuzzy Sets, Artificial Neural Networks, and Evolutionary Computation. In IFSA World Congress and NAFIPS Annual Meeting

124

Introduction To Algorithms

(IFSA/NAFIPS), 2013 Joint (pp. 739–744). IEEE. 140. Selfridge, O. G., (1959). Pandemonium: a paradigm for learning. The Mechanization of Thought Processes. H.M.S.O., London. London. 141. Shmailov, M. M., (2016). Breaking Through the Iron Curtain. In Intellectual Pursuits of Nicolas Rashevsky (pp. 93–131). Birkhäuser, Cham. 142. Shmailov, M. M., (2016). Scientific Experiment: Attempts to Converse Across Disciplinary Boundaries Using the Method of Approximation. In Intellectual Pursuits of Nicolas Rashevsky (pp. 65–92). Birkhäuser, Cham. 143. Singh, S., & Lal, S. P., (2013, December). Educational courseware evaluation using machine learning techniques. In e-Learning, e-Management and e-Services (IC3e), 2013 IEEE Conference on (pp. 73–78). IEEE. 144. Sleeman, D. H. (1983). Inferring student models for intelligent computer-aided instruction. In Machine Learning, Volume I (pp. 483510). 145. Somorjai, R. L., Dolenko, B., & Baumgartner, R., (2003). Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics, 19(12), 1484–1491. 146. Stepp, R. E., & Michalski, R. S., (1986). Conceptual clustering: Inventing goal-oriented classifications of structured objects, Morgan Kaufman Publishers, vol. 2, pp. 3-11. 147. Stewart, N., & Brown, G. D., (2004). Sequence effects in the categorization of tones varying in frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(2), 416. 148. Sutton, R. S., (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9–44. 149. Sutton, R. S., & Barto, A. G., (1998). Introduction to Reinforcement Learning (Vol. 135). Cambridge: MIT Press. 150. Tapas Kanungo, D. M., (2002). A local search approximation algorithm for k-means clustering. Proceedings of the Eighteenth Annual Symposium on Computational Geometry (pp. 10–18). Barcelona, Spain: ACM Press. 151. Tavares, J., Machado, P., Cardoso, A., Pereira, F. B., & Costa, E., (2004, April). On the evolution of evolutionary algorithms. In European

Types of Machine Learning Algorithms

125

Conference on Genetic Programming (pp. 389–398). Springer, Berlin, Heidelberg. 152. Teather, L. A., (2006). Pathophysiological effects of inflammatory mediators and stress on distinct memory systems. In Nutrients, Stress, and Medical Disorders (pp. 377–386). Humana Press. 153. Timothy Jason Shepard, P. J., (1998). Decision Fusion Using a MultiLinear Classifier. In Proceedings of the International Conference on Multisource-Multisensor Information Fusion. 154. Tom, M., (1997). Machine Learning. Machine Learning, Tom Mitchell, McGraw Hill, 1997: McGraw Hill. 155. Trevor Hastie, R. T., (2001). The Elements of Statistical Learning. New York, NY, USA: Springer Science and Business Media. 156. Turnbull, S., (2002). The science of corporate governance. Corporate Governance: An International Review, 10(4), 261–277. 157. Vancouver, J. B., (1996). Living systems theory as a paradigm for organizational behavior: Understanding humans, organizations, and social processes. Systems Research and Behavioral Science, 41(3), 165–204. 158. Wagh, S. P., (1994). Intonation knowledge-based speaker recognition using neural networks,.” MTech. project report, Indian Institute of Technology, Department of Electrical Engineering. 159. Weiss, Y., & Fleet, D. J. (2002). Velocity likelihoods in biological and machine vision. Probabilistic models of the brain: Perception and neural function, 81–100. 160. Widrow, B. W., (2007). Adaptive Inverse Control: A Signal Processing Approach. Wiley-IEEE Press. 161. Xu, L., & Jordan, M. I., (1996). On convergence properties of the EM algorithm for Gaussian mixtures. Neural Computation, 8(1), 129–151. 162. Y. Chali, S. R., (2009). Complex Question Answering: Unsupervised Learning Approaches and Experiments. Journal of Artificial Intelligent Research, 1–47. 163. Yang, Z., & Purves, D., (2004). The statistical structure of natural light patterns determines perceived light intensity. Proceedings of the National Academy of Sciences of the United States of America, 101(23), 8745–8750. 164. Yu, L. L., (2004, October). Efficient Feature Selection via Analysis of

126

Introduction To Algorithms

Relevance and Redundancy. JMLR, 1205–1224. 165. Yusupov, T., (2007). The efficient market hypothesis through the eyes of an artificial technical analyst (Doctoral dissertation, ChristianAlbrechts Universität Kiel). 166. Zanibbi, R., & Blostein, D., (2012). Recognition and retrieval of mathematical expressions. International Journal on Document Analysis and Recognition (IJDAR), 15(4), 331–357. 167. Zeiler, M. D., & Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional. Neural Netw, 1, 1-12. 168. Zhang, S. Z., (2002). Data Preparation for Data Mining. Applied Artificial Intelligence. 17, 375 – 381. 169. Zoltan-Csaba, M., Pangercic, D., Blodow, N., & Beetz, M., (2011). Combined 2D-3D categorization and classification for multimodal perception systems. Int. J. Robotics Res. Arch, 30(11).

CHAPTER

5

APPROXIMATION ALGORITHMS

CONTENTS 5.1. Introduction..................................................................................... 128 5.2. Approximation Strategies................................................................. 132 5.3. The Greedy Method......................................................................... 135 5.4. Sequential Algorithms...................................................................... 141 5.5. Randomization................................................................................ 144 5.6. A Tour Of Approximation Classes..................................................... 146 5.7. Brief Introduction To PCPs............................................................... 149 5.8. Promising Application Areas For Approximation and Randomized Algorithms......................................................... 150 5.9. Tricks Of The Trade........................................................................... 151 References.............................................................................................. 153

128

Introduction To Algorithms

5.1. INTRODUCTION Most interesting everyday optimization problems are extremely challenging from a computational viewpoint. In fact, quite often, discovering a nearoptimal or even an optimal solution to an optimization problem of largescale may necessitate computational resources beyond what is essentially available. There is a significant body of literature discovering the computational properties of various optimization problems by seeing how the computational strains of a solution technique grow with the extent of the problem case to be solved (Aho et al., 1976; 1979; Alon & Spencer, 2000). A key distinction is made amongst problems that need computational resources that develop polynomially with problem extent versus those for which the necessary resources grow exponentially. The former class of problems is termed efficiently solvable, however, problems in the latter class are deemed intractable since the exponential growth in necessary computational resources reduces all but the smallest cases of such problems unsolvable (Cook & Rohe, 1999; Chazelle et al., 2001; Carlson et al., 200). It has been concluded that a large number of common optimization problems are categorized as NP-hard. It is widely assumed—though not yet verified (Clay Mathematics Institute, 2003)—that NP-hard problems are obstinate, which reflects that there is not any effective algorithm (i.e., one that measures polynomially) that is guaranteed to discover an optimal solution for this type of problems. NP-hard optimization tasks’ instances are the minimum bin packing problem, the minimum traveling salesman problem, and the minimum graph coloring problem. As a result of the character of NP-hard problems, an advancement that leads to a better appreciation of the computational properties, structure, and means of solving one of them, approximately or exactly, also leads to improved algorithms for resolving hundreds of other diverse but related NP-hard problems. A number of computational problems, in regions as diverse as computer-aided design and finance, operations research, economics, biology, have been revealed to be NP-hard (Aho & Hopcrosft, 1974; Aho et al., 1979, 1991). A natural query is whether near-optimal (i.e., approximate) solutions can probably be found efficiently for hard optimization problems like these. Heuristic local search techniques, such as simulated annealing and tabu search (see Chapters 6 and 7), are usually quite effective at finding approximate solutions. However, these techniques do not come with rigorous assurances concerning the class of the absolute solution or the requisite maximum runtime. In this chapter, we will discourse a more theoretical

Approximation Algorithms

129

methodology to this issue involving alleged “approximation algorithms,” which are efficient algorithms that can be verified to produce solutions of a definite quality. We will also discuss categories of problems for which no effective approximation algorithms exist, hence leaving an important part for the quite common heuristic local search methods (Shaw et al., 1998; Dotu et al., 2003). The design of decent approximation algorithms is an extremely active area of research in which new methods and techniques are found. It is quite possible that these methods will become of increasing significance in tackling large everyday optimization problems (Feller, 1971; Hochbaum, 1996; Cormen et al., 2001). In the early 1970s and late 1960s, a precise idea of approximation was projected in the context of bin packing and multiprocessor scheduling (Graham, 1966; Garey et al., 1972; 1976; Johnson, 1974). Approximation algorithms, in general, have two properties. First, they provide a reasonable solution to a problem case in polynomial time. In most circumstances, it is not difficult to develop a procedure that discovers some feasible solution (Kozen, 1992). Though, we are concerned with having some assured class of the solution that is the second aspect describing approximation algorithms. The class of an approximation algorithm remains the maximum “distance” between the optimal solutions and its solutions, assessed over all the possible causes of the problem. Casually, an algorithm approximately resolves an optimization problem if it continually returns a possible solution whose measure is near to optimal, for instance within a factor confined by a constant or by a gradually increasing function of the input size. Assuming constant α, an algorithm A is an α-approximation algorithm for a particular minimization problem Π if its answer is as a maximum α time the optimum, in view of all the possible cases of problem Π. This chapter focuses on NP-hard optimization problems and their design of approximation algorithms. We will show in what way standard algorithm design methods such as local search and greedy methods have been used to develop good approximation algorithms. We will also show exactly how a randomization is a potent tool for scheming approximation algorithms. Randomized algorithms are fascinating because in general such methods are easier to implement and analyze, and quicker than deterministic algorithms (Motwani & Raghavan, 1995). A randomized algorithm is basically an algorithm that performs a few of its choices arbitrarily; it “flips a coin” to choose what to do at some phases. As a result of its random component,

130

Introduction To Algorithms

various executions of a randomized algorithm can result in different runtime and solutions, even when seeing the same case of a problem. We will demonstrate how one can associate randomization with approximation methods in order to approximate NP-hard optimization problems efficiently. In this case, the runtime of the approximation algorithm, the approximation solution, and the approximation ratio may be random variables. Challenged with an optimization problem, the aim is to generate a randomized approximation algorithm using runtime provably confined by a polynomial and whose possible solution is near to the optimal solution, in probability. Note that these guarantees stand for every case of the problem being solved. The only arbitrariness in the performance warranty of the randomized approximation algorithm derives from the algorithm itself & not from the instances. Since we do not see efficient algorithms to discover optimal solutions for NP-hard problems, a crucial question is whether we can proficiently compute decent approximations that are near to optimal. It would be very exciting (and practical) if one could see from exponential to polynomial time intricacy by relaxing the check on optimality, especially if we assure at most a relatively small error (Qi, 1988; Vangheluwe et al., 2003; Xu, 2005) (Figure 5.1).

Figure 5.1: Schematic Illustration of a mechanism for approximation algorithms. (Source: http://faculty.ycp.edu/~dbabcock/PastCourses/cs360/lectures/ lecture29.html).

Approximation Algorithms

131

Decent approximation algorithms have been suggested for some significant problems in combinatorial optimization. The so-called APX intricacy class comprises the problems that permit a polynomial-time approximation algorithm by a performance ratio confined by a constant. For some problems, we can plan even better approximation algorithms. More precisely we can contemplate a group of approximation algorithms that permits us to get as near to the optimum as we want, as long as we are ready to trade quality with time (Reinelt, 1994; Indrani, 2003). This special group of algorithms is termed an approximation scheme (AS) & the so-called PTAS category is the category of optimization problems that permit on behalf of a polynomial time approximation scheme that gauges polynomially in the extent of the input. In some instances, we can devise approximation systems that gauge polynomially, both in the magnitude of the approximation error and in the size of the input. We refer to the category of problems that permit this type of fully polynomial time approximation schemes by way of FPTAS (Faigle et al., 1989; Boyd & Pulley blank, 1990; Gomes & Shmoys, 2002). For some NP-hard problems, however, the approximations that have been achieved so far are quite poor, & in some instances, no one has ever been capable of devising approximation algorithms in the optimum constant factor (Chen & Epley, 1970; Hochbaum & Shmoys, 1987). Initially, it was not obvious if these weak outcomes were due to our deficiency of ability in devising decent approximation algorithms for this type problem or to some intrinsic structural characteristic of the problems that disregards them from having decent approximations. We will see that actually there are restrictions to an approximation which are inherent to some categories of problems (Graham, 1969; Graham & Pollak, 1971; McCrary et al., 2000). For example, in some instances, there is a poorer bound on the approximation constant factor, and in other instances, we can probably demonstrate that there are not any approximations within some constant factor of the optimum. Essentially, there is an extensive range of scenarios starting from NP-hard optimization problems that permit approximations to any essential degree, to problems not permitting approximations at all. We will deliver a brief overview of the proof techniques used to develop non-approximability results (Ryser, 1951; andersen & Hilton, 1983; Pulleyblank, 1989) (Figure 5.2).

132

Introduction To Algorithms

Figure 5.2: Approximation route for an approximation algorithm problem. (Source: http://fliphtml5.com/czsc/chmn/basic).

We believe that the finest way to understand the notions behind randomization and approximation is to study cases of algorithms with these characteristics, through examples. Thus in each segment, we will first present the intuitive concept, then highlight its salient points through well-selected instances of prototypical problems (Banderier et al., 2003; Podsakoff et al., 2009; Bennett et al., 2015). Our aim is far from trying to deliver a comprehensive analysis of approximation algorithms or the ideal approximation algorithms for the introduced problems. Instead, we describe the various design and evaluation methods for randomized and approximation algorithms, using obvious examples that allow for comparatively simple and intuitive descriptions. For some problems discoursed in the chapter, there are approximations with enhanced performance guarantees but needing more sophisticated proof methods that are beyond the range of this introductory tutorial. In such instances, we will guide the reader to the related literature results (Spyke, 1998; Diening et al., 2004; Becchetti et al., 2006).

5.2. APPROXIMATION STRATEGIES 5.2.1. Optimization Problems We will describe optimization problems in an orthodox way (Aho et al., 1979; 1981; Ausiello et al., 1999). There are three defining features of each optimization problem: the criterion of a possible solution to the problem, the configuration of the input instance, & the measure function used to decide which possible solutions are deliberated to be optimal. It will be obvious from the problem title whether we desire a possible solution with a maximum or minimum measure. To explain, the minimum vertex cover problem can be defined in the following manner (Colbourn, 1984; Ansótegui et al., 2004; Leahu & Gomes, 2004).

Approximation Algorithms

133

The minimum Vertex Cover case is illustrated below: Case: An undirected graph G = (V,E). Solution: A subset S ⊆ V so that for every {u,v} ∈ E, either u ∈ S or v ∈ S. Measure: |S|.

We use the following scheme for items related to a case I.

i. Sol(I) is the set of possible solutions to I. ii. mI: Sol(I) → R is the measure function concomitant with I. iii. Opt(I) ⊆ Sol(I) is the possible solutions with optimal measure (either maximum or minimum). Hence, we may completely stipulate an optimization problem Π by providing a group of tuples {(I, Sol(I), mI, Opt(I))} over all possible cases I. It is important to remember that Sol(I) and I can be over entirely different domains. In the above case, the group of I is all pointless graphs, while Sol(I) is all possible subdivisions of vertices in a graph (Chazelle, 2000; Chazelle & Lvov, 2001; Vershynin, 2009).

5.2.2. Approximation and Performance Crudely speaking, an algorithm approximately resolves an optimization problem if it returns a possible solution at all times whose measure is near to optimal. This intuition is made accurate below. Consider Π an optimization problem. We assume that an algorithm (A) feasibly solves Π if specified a case I ∈ Π, A(I) ∈ Sol(I); i. e, A returns a possible solution to me. Let A feasibly resolve Π. Then we describe the approximation ratio α (A) of A to be the lowest possible ratio concerning the measure of A (I) & the extent of an optimal solution. Formally,

α ( A) = min I ∈∏

mI ( A( I )) mI (Opt ( I ))

This ratio is each time at least 1 for minimization problems. For maximization problems, it is always at extreme 1, respectively.

5.2.3. Complexity Background An optimization problem having measure 0–1 valued is defined as a decision problem. That is, solving a case I of a decision problem relates to answering a yes/no query about I (where yes relates to a measure of 1, and no relates to

Introduction To Algorithms

134

a measure of 0). We may, therefore, denote a decision problem as a subgroup S of the group of all possible cases: members of S denote instances with measure 1. Casually, P (polynomial time) is regarded as the category of decision problems Π that have a consistent algorithm AΠ such that each instance I ∈ Π is resolved by AΠ in a polynomial (|I|k for some constant k) a total number of steps on any “rational” model of computation. Rational models include single-tape & multi-tape Turing machines, pointer machines, random access machines, etc. (Lovász, 1975; Gurevich, 1990; Belanger & Wang, 1993). While P is meant to signify a category of problems that can be proficiently solved, NP (non-deterministic polynomial time) is a category of decision problems Π, which can be efficiently checked. More correctly, NP is the category of decision problems Π that have a consistent decision problem Π0 in P and constant k satisfying: I ∈ Π if and only if there exists C ∈ {0, 1} |I|k such that (I, C) ∈ Π’

In other words, one can conclude if a case I am in an NP problem that can be proficiently solved if one is also given a certain short string C that is of length polynomial in me. For instance, deliberate the NP problem of defining if a graph G having a path P that travels across all nodes exactly one time (this is called as the Hamiltonian path problem) (Johnson, 1973; Ho, 1982; Blass & Gurevich, 1990). If one is given G with an explanation of P, it is quite easy to confirm that P is certainly such a path by testing that: i. ii. iii.

P has all nodes in G. No node seems more than one time in P. Any two contiguous nodes in P have an advantage between them in G. However, it is not identified how to find this kind of path P given merely a graph G, & this is the major difference between NP and P. Actually, the Hamiltonian path problem does not only exist in NP but is present in NP-hard also, see the Introduction (Aharoni et al., 1985; Garey & Johnson, 2002). Notice that although a short proof is always present if I ∈ Π, it must not be the instance that short proofs are present for instances not in Π. Therefore, while P problems are deliberated to be those that are efficiently decidable & NP problems are those deliberated to be efficiently verifiable by a short proof (Nemhauser & Ullmann, 1969; Hopper & Turton, 2001; Chazelle, 2004). We will also contemplate the optimization counterparts to NP and P, which are NPO and PO, respectively. Informally, PO is the category of

Approximation Algorithms

135

optimization problems that have a polynomial time algorithm which always yields an optimal solution to every case of the problem, while NPO is the category of optimization problems in which polynomial time computable is the measure function, and an algorithm can decide whether or not a probable solution is possible in polynomial time (Chazelle & Liu, 2001; Röglin & Vöcking, 2007; Röglin & Teng, 2009). Here, we will focus on approximating solutions to the “toughest” of NPO problems, those problems in which the consistent decision problem is NP-hard. Amusingly, some NPO problems of this kind can be approximated very finely, whereas others can barely be approximated at all (Jiménez et al., 2001; Cueto et al., 2003; Aistleitner, 2011).

5.3. THE GREEDY METHOD Greedy approximation algorithms are intended with a simple philosophy in attention: repeatedly make choices that develop one nearer and nearer to a possible solution for the problem. These choices would be optimal according to a flawed but effortlessly computable heuristic. In particular, this heuristic tends to be as opportunistic as conceivable in the short run. That is why such algorithms are termed greedy—a better name could be “short-sighted.” For example, suppose my aim is to find the shortest path from my home to the theater (Klein & Young, 2010; Ausiello et al., 2012). If I assumed that the walkthrough Forbes Avenue is almost the same distance as the walkthrough Fifth Avenue, now if I am nearer to Forbes than Fifth Avenue, it would be sensible to walk towards Forbes & take that route (Wang, 1995; Khuller, 1998; Toth et al., 2017). Obviously, the success of this strategy relies upon the correctness of my conviction that the Forbes path is definitely just as decent as the Fifth path. We will illustrate that for some problems, picking a solution conferring to an opportunistic, imperfect heuristic reaches a non-trivial approximation algorithm (Karp, 1975; Paz & Moram, 1977; Mossel et al., 2005).

5.3.1. Greedy Vertex Cover In the preliminaries, the minimum vertex cover problem was described. Alternatives of the problem come up in various areas of optimization research. We will define a simple greedy algorithm, which is a 2-approximation for the problem; i. e, the vertex cover cardinality resumed by our algorithm is no larger than two times the cardinality of the least cover (Khot, 2002; Khot et al., 2007; Khot & Vishnoni, 2015). The Greedy-VC algorithm is as below.

136

Introduction To Algorithms

Firstly, let S be an empty group. Choose a random edge {u,v}. Add u and v to S, & remove u & v from the graph. Repeat until no edges persist in the graph. Yield S as the vertex cover. Proof: Firstly, we claim S as resumed by Greedy-VC is definitely a vertex cover. Suppose not; then there occurs an edge e that was not protected by any vertex in S. Since we only take out vertices from the graph which are in S, an edge e will remain in the graph once Greedy-VC had concluded, which is a contradiction (Kaufman, 1974; Liu, 1976; Durand et al., 2005). Let S∗ is a minimum vertex cover. We will now indicate that S∗ contains no less than |S|/2 vertices. It will trail that |S∗| ≥ |S|/2, therefore our algorithm takes a |S|/|S∗| ≤ 2 approximation ratio. Since the edges we picked in Greedy-VC do not share endpoints at all, it follows that:

i. S|/2 is the total number of edges we picked; and ii. S∗ must have picked at least one vertex from every edge we picked. It follows that |S∗| ≥ |S|/2.

Occasionally when one verifies that an algorithm has a definite approximation ratio, the analysis is fairly “loose,” and might not reflect the best probable ratio that can be achieved. It reflects that Greedy-VC is certainly not better than a 2-approximation. Specifically, there is an infinite group of Vertex Cover cases where Greedy-VC provably picks exactly double the number of vertices required to cover the graph, specifically in the case of comprehensive bipartite graphs (Book & Siekmann, 1986; Hermann & Pichler, 2008). One final remark must be noted on Vertex Cover. Though the above algorithm is actually quite simple, no superior approximation algorithms are known! Actually, it is widely assumed that minimum vertex cover cannot be approximated better than 2 − ɛ for some ɛ > 0 unless P = NP (Hermann & Kolaitis, 1994; Khot & Regev, 2003).

Approximation Algorithms

137

Figure 5.3: A sketch of a complete bipartite graph with n nodes colored red and n nodes colored blue. (Source: https://link.springer.com/chapter/10.1007/0-387-28356-0_18).

A graph for which its vertices can be allotted one of two colors is termed as bipartite (say, blue or red), in such a manner that all edges have different colored endpoints. When applying Greedy-VC on these cases (for any normal number n), the algorithm will pick all 2n vertices.

5.3.2. Greedy MAX-SAT The problem MAX-SAT has been very well-considered; variants of it arise in several areas of discrete optimization. To introduce it needs a bit of terminology. We will deal exclusively with Boolean variables (i.e., those which are either false or true), which we will represent by x1, x2, etc. A literal is explained as either negation of a variable or a variable (e.g., x7, –x11 are literals). A clause is explained as the OR of few literals (e.g., (–x1 ∨x7 ∨–x11) is a clause). We assume that a Boolean formula exists in CNF (conjunctive normal form) if it is given as an AND of clauses (e.g., (–x1 ∨x7 ∨–x11)∧(x5 ∨–x2 ∨–x3) is in CNF). Lastly, the MAX-SAT problem is to discover a consignment to the variables of the Boolean formula in CNF so that the maximum total of clauses are fixed to true, or are satisfied. Correctly: MAX-SAT problem case is illustrated below:

138

Introduction To Algorithms

Instance: A Boolean formula F in conjunctive normal form CNF. Solution: An assignment a, that is a function from every variable in F to {true or false}. Measure: The number of clauses in F which are set to true (being satisfied) while the variables in F are assigned rendering to a. What might be a normal greedy strategy for approximately resolving MAXSAT? One approach is to choose a variable that satisfies several clauses if it is set to a definite value. Instinctively, if a variable occurs invalid in several clauses, putting the variable to false will gratify several clauses; hence this approach should approximately resolve the problem well. Let n(li, F) represent the total number of clauses in F in which the literal li appears. Greedy-MAXSAT: Choose a literal li having maximum n(li, F) value. Set the analogous variable of li so that all clauses having li are satisfied, producing a reduced F. Repeat until no variables stay in F.

It is easy to appreciate that Greedy-MAXSAT goes in polynomial time (coarsely quadratic time, contingent with the computational model picked for analysis). It is also a “decent” approximation for the MAX-SAT problem.

5.3.3. Greedy MAX-CUT Our next example illustrates how local search (specifically, hill-climbing) may be used in designing approximation algorithms. Hill-climbing is naturally a greedy approach: when one has a possible solution x, one tends to improve it by picking some feasible y which is “close” to x, but then has a better measure (higher or lower, depending on maximization or minimization). Repeated attempts at improvement frequently produce “locally” optimal solutions which have a good measure comparative to a universally optimal solution (i.e., a member of Opt(I)). We explain local search by proposing an approximation algorithm meant for the NP-complete MAX-CUT issue which discovers a locally optimal substantial assignment. It is important to remember that not all local search approaches try to discover a local optimum—for instance, simulated annealing tries to escape from local optima hoping to find a global optimum (Ghalil, 1974; Kirkpatrick et al., 1983; Černý, 1985). MAX-CUT problem case is illustrated below: Case: An undirected graph G = (V,E). Solution: A cut of the graph, that is, a pair (S, T) such that S ⊆ V & T = V − S.

Approximation Algorithms

139

Measure: The cut size, that is the number of edges intersecting the cut, i.e., |{{u,v} ∈ E | u ∈ S,v ∈ T}|.

Our local search algorithm constantly improves the current possible solution by altering one vertex’s position in the cut, till no more improvement can be attained. We will show that the cut size is at least m/2 at such a local maximum. Local-Cut: Start with a random cut of V. For each vertex, conclude if taking it to the other part of the partition upturns the size of the cut. If so, change it. Repeat up until no such movements are probable. First, note that this algorithm reprises at most m times, as every movement of a vertex upturns the size of the cut by no less than 1, and a cut can be as a maximum m in size. Local-Cut is a1/2 approximation algorithm for MAX-CUT as demonstrated below: Proof. Suppose (S, T) be the cut yielded by the algorithm, & consider a vertex v. After the algorithm ends, observe that the total number of edges contiguous with v that cross (S, T) is more than the total number of contiguous edges that do not cross, else v would have been moved. Suppose deg(v) be the degree of (v). Then our observation suggests that however, deg(v)/2 limits out of v cross the cut yielded by the algorithm. Let m∗ be the number of edges intersecting the cut returned. Each edge takes two endpoints, hence the sum/counts each intersecting edge at most twice, i.e.

∑ (deg(v) / 2) ≤ 2m

*

v∈V

However, note ∑ v∈V deg(v) = 2m : when adding up all degrees of vertices, each edge gets counted precisely twice, once for every endpoint. We conclude that:

= m



v∈V

(deg(v) / 2) ≤ 2m*

The algorithm has the following approximation ratio

m* 1 ≥ . m 2

It seems that MAX-CUT concedes much-improved approximation ratios than 1/2; an alleged relaxation of the issue to a semi-certain linear

Introduction To Algorithms

140

program produces a 0.8786 approximation (Goemans & Williamson, 1995). However, MAX-CUT cannot be approximated randomly as well, like several optimization problems (1 − ɛ, for all ɛ> 0) except P = NP. That is to state, it is implausible that MAX-CUT exists in the PTAS complexity class.

5.3.4. Greedy Knapsack The knapsack problem & its special cases have been widely studied in operations research. The idea behind it is typical: you have a knapsack having capacity C, & a group of items 1,..., n. Each item has a certain cost ci of carrying it, together with a profit pi that you would gain by carrying it. The problem is then to discover a subset of objects with the cost at most C, devising maximum profit (Edmonds, 1965; Holland, 1992; Halperin, 2002). Maximum Integer Knapsack case is illustrated below: Case: A capacity C ∈ N, & a number of objects n ∈ N, with consistent costs & profits ci, pi ∈ N for all i = 1, ..., n. Solution: A subset S ⊆ {1, ..., n} so that Pj∈S cj ≤ C. Measure: The total profit ∑j∈S pj.

Maximum Integer Knapsack, as framed above, is NP-hard. There is also a “fractional” form of this problem (we name it Maximum Fraction Knapsack), that can be resolved in polynomial time. In this form, rather than having to choose the entire item, one is permitted to pick fractions of items, similar to 1/8 of the 1st item, 1/2 of the 2nd item, and so on. The corresponding profit & cost incurred from the objects will be also fractional (1/8 of the profit & cost of the 1st, 1/2 of the profit & cost of the 2nd, & so on) (Ibarra & Kim, 1975; Miller, 1976; Geman & Geman, 1987). One greedy strategy for resolving these two problems is to box items with the biggest profit-to-cost ratio first, hoping to get several small-cost high-profit objects in the knapsack. It turns out that such algorithm will not provide any constant approximation guarantee, rather a tiny variant on this strategy will provide a 2-approximation for Integer Knapsack, & a precise algorithm for Fraction Knapsack (Adleman, 1980; Guibas et al., 1983; Lenstra et al., 1990). The algorithms for Integer Knapsack & Fraction Knapsack are, respectively: i.

Greedy-IKS: Pick items with the biggest profit-to-cost ratio first, till the total cost of items picked is greater than C. Let j be the last object is chosen, & S be the group of items picked before j. Return either {j} or S, contingent with which one is more beneficial.

Approximation Algorithms

141

ii.

Greedy-FKS: Pick items as in Greedy-IKS. When the item j marks the cost of the existing solution greater than C, improve the fraction of j so that the resultant cost of the solution is precisely C. We omit a proof of the succeeding. A full treatment can be seen in Ausiello et al. (1999). Greedy-KS is a 12-approximation for Maximum Integer Knapsack as illustrated below: Proof. Fix a case of the problem. Suppose P = ∑i∈S pi is the total profit of objects in S, & j be the last item picked (as specified in the algorithm). We will demonstrate that P +pj is equal to or greater than the profit of the optimal Integer Knapsack solution. It trails that one of the S or {j} has no less than half the yield (profit) of the optimal solution (Alkalai & Geer, 1996; LaForge & Turner, 2006; LaForge et al., 2006). Suppose SI∗ is an optimal Integer Knapsack solution of the given case, with total profit PI∗. Similarly, let SF∗ & PF∗ relate to the optimal Fraction Knapsack solution. Note that PF∗ ≤ PI∗. By the exploration of the algorithm aimed at Fraction Knapsack, P pj, in which ∈ (0,1] is the fraction picked for object j in the algorithm. Therefore P + pj ≥ P + pj PI∗

Actually, this algorithm can be drawn-out to acquire a PTAS (polynomial time approximation scheme) for Maximum Integer Knapsack, (observe Ausiello et al., 1999). A PTAS has the characteristic that, for any stable ∈ > 0 provided, it yields a (1 + ∈)-approximate solution. Added, in the input size, the runtime is polynomial, provided that is constant. This allows us to identify a runtime that possesses 1/ in the exponent. It is typical to observe a PTAS as a group of successively better (then also slower) approximation algorithms, individually running with a consecutively smaller ∈ > 0. This is instinctively why they are named an approximation strategy, as it is meant to propose that a range of algorithms are used. A PTAS is quite influential; such a scheme can approximately resolve a problem with ratios subjectively close to 1. Nevertheless, we will observe that several problems probably do not possess a PTAS, unless P = NP (Goemans & Williamson, 1995; Jain & Vazirani, 2001; Festa & Resende, 2002).

5.4. SEQUENTIAL ALGORITHMS Sequential algorithms are employed for approximations on problems in which a feasible solution is a splitting of the case into subsets. A sequential

142

Introduction To Algorithms

algorithm “sorts” the objects of the case in some manner, and chooses partitions for the case based on this ordering (Wallace et al., 2004; Zhu & Wilhelm, 2006; Wang, 2008).

5.4.1. Sequential Bin Packing We first consider the issue of Minimum Bin Packing that is similar in regard to the knapsack problems. Minimum Bin Packing case is illustrated below: Case: A set of objects S = {r1,... ,rn}, where ri ∈ (0,1] for all i = 1,... ,n.

Solution: Splitting of S into bins B1,... ,BM so that ∑rj∈Bi rj ≤ 1 for all i = 1,... ,M. Measure: M.

An evident algorithm for Minimum Bin Packing stays an online strategy. Initially, let j = 1 & have a bin B1 available. As one runs across the input (r1,r2, etc.), go for packing the new object ri into the last bin employed, Bj. If ri does not suit in Bj, make another bin Bj + 1 & put ai in it. This algorithm is “online” since it works on the input in a stable order, and hence adding new items to the case while the algorithm is working does not alter the outcome (Herr, 1980; Smith, 1986; Stock & Watson, 2001). Last-Bin is a 2-approximation to Minimum Bin Packing as illustrated below: Proof. Suppose R is the sum of all objects, so R = ∑ri∈S ri. Suppose m is the total number of bins employed by the algorithm, & let m∗ be the lowest number of bins conceivable for the given case. Observe that m∗ ≥ R, since the total number of bins required is at least the total mass of all items (each bin embraces 1 unit). Now, given any couple of bins, Bi and Bi+1 yielded by the algorithm, the totality of items from ‘S’ in Bi & Bi+1 is at least 1; or else, we would have kept the items of Bi+1 in Bi in its place. This indicates that m ≤ 2R. Therefore m ≤ 2R ≤ 2m∗, & the algorithm is a 2-approximation (Price, 1973; Maurer, 1985; Berry & Howls, 2012). An interesting workout for the reader is to build a series of examples indicating that this approximation bound, similar to the one for GreedyVC, is constructed. As one might assume, there exist algorithms that provide better approximations than the above. For instance, we do not even deliberate the previous bins B1,..., Bj−1 while trying to pack an ai, just the last one is considered (Arora et al., 2001). Motivated by this thought, consider the following alteration to Last-Bin. Choose each item ai in declining order of size, putting ai in the first accessible bin out of B1,..., Bj. (So a new bin is simply created if ai cannot be fitted in

Approximation Algorithms

143

any of the former j bins.) Call this novel algorithm First-Bin. An improved approximation bound can be derived, through an elaborate analysis of cases.

5.4.2. Sequential Job Scheduling One of the key issues in scheduling theory is exactly how to allocate jobs to multiple machines such that all the jobs are accomplished efficiently. Here, we will consider job accomplishment in the shortest extent of time possible. For the purposes of simplicity and abstraction, we will accept the machines are identical in dealing out power for each job. Minimum Job Scheduling is illustrated below: Case: An integer k & a multi-set T = {t1,... ,tn} of times, ti ∈ Q for all i = 1,... ,n (that is, the ti are fractions).

Solution: An allocation of jobs to machines, that is, a function a from {1,... ,n} to {1,... ,k}. Measure: The accomplishment time for all machines, supposing they run in parallel: max{∑i:a(i)=j ti | j ∈ {1,... ,k}}.

The algorithm we suggest for Job Scheduling is similarly online: when reading a novel job with time ti, allocate it to the machine j which currently has the least aggregate of work; i.e., the j with minimum ∑i:a(i)=j ti. Sequential Jobs is a 2-approximation meant for Minimum Job Scheduling as illustrated below:

Proof. Let j be a machine having maximum completion time, & let i be the catalog of the last job allocated to j by the algorithm. Let si,j be the amount of all times for jobs preceding i that are allocated to j. (This may be assumed as the time which job i begins on machine j). The algorithm allocated i to the machine having the least extent of work, therefore all other machines j’ at the moment have larger ∑i:a(i)=j’ ti.. Hence, that is, si,j is less 1/k of the overall time of all jobs (remember k is the total number of machines).

1 n = B t ≤ m* ∑ i =1 i k Note , the accomplishment time for an optimal solution, since the sum relates to the case where each machine takes exactly the equal fraction of time to complete. Hence the accomplishment time for machine j is S i,j + ti ≤ m∗ + m∗ = 2m∗ So the maximum accomplishment time is at most double that of an optimal solution. This is not the finest one can do: Minimum Job Scheduling

144

Introduction To Algorithms

also possesses a PTAS (Papadimitriou & Steiglitz, 1982; Vazirani, 1983).

5.5. RANDOMIZATION Randomness is a powerful source for algorithmic design. Upon the supposition that one has access to impartial coins that may be flipped & their values (heads or tails) taken out, a wide array of novel mathematics can be employed to support the analysis of an algorithm. It is often the instance that a simple randomized algorithm would have the same performance guarantees by means of a complicated deterministic (i.e., non-randomized) technique. One of the most fascinating discoveries in the zone of algorithm design is that by adding randomness into a computational process may sometimes lead to a substantial speedup over purely deterministic techniques. This may be intuitively described by the subsequent set of observations. A randomized algorithm can be observed as a probability distribution upon a set of deterministic algorithms. The conduct of a randomized algorithm can fluctuate on a given input, dependent upon the random selections made by the algorithm; therefore when we contemplate a randomized algorithm, we are indirectly considering a randomly selected algorithm from a group of algorithms. If a substantial portion of these deterministic algorithms accomplishes well on the given input, at that point a strategy of resuming the randomized algorithm after a definite point in runtime will result in a speed-up (Nemhauser & Wolsey, 1988; Gomes et al., 1998). Some randomized algorithms are capable of efficiently solving problems for which no effective deterministic algorithm is known, for example, polynomial identity testing (Motwani & Raghavan, 1995). Randomization is also a vital component in the prevalent simulated annealing method for resolving optimization problems (Kirkpatrick et al., 1983). At length, the problem of defining if a specified number is prime (a major problem in new cryptography) was only efficiently resolvable using randomization (Goldwasser & Kilian, 1986; Rabin, 1980; Solovay & Strassen, 1977). Very lately, a deterministic algorithm was discovered for primality (Agrawal et al., 2002).

5.5.1. Random MAX-CUT Solution We saw earlier a greedy approach for MAXCUT that produces a 2-approximation. Using randomization, we can provide an extremely small approximation algorithm that partakes the same performance in

Approximation Algorithms

145

approximation, & runs in expected polynomial time. Random-Cut: Select a random cut (i.e., a random splitting of the vertices into two groups). If there are less than m/2 edges intersecting this cut, repeat. Random-Cut remains a ½ approximation algorithm for MAX-CUT which runs in expected polynomial time as demonstrated below: Proof. Suppose X is a random variable signifying the number of edges intersecting a cut. For i = 1,..., m, Xi will be a pointer variable which is 1 if the itch edge intersects the cut, and 0 otherwise. Then X = ∑ i =1 X i , so by m

linearity of m probability. E[ X ] = ∑ i =1 E[ X i ] m

Now for any edge {u, v}, the probability it intersects a randomly picked cut is 1/2. (Why? We randomly placed u & v in one of two probable partitions, so u will be in the same partition equally as v with probability 1/2.) Hence, E[Xi] = 1/2 for all i, so E[X] = m/2. This only shows that by selecting a random cut, we anticipate getting at least m/2 edges intersecting. We want a randomized algorithm which always returns a good cut, & its running time is an arbitrary variable whose expectancy is polynomial. Let us calculate the probability that X ≥ m/2 after a random cut is chosen. In the worst instance, when X ≥ m/2 all the probability is based on m, and when X < m/2 all the probability is based on m/2−1. This makes the expectancy of X as high as possible, whereas making the likelihood of gaining an at least-m/2 cut small. Formally, m/2 = E[X] ≤ (1 − Pr[X ≥ m/2])(m/2 − 1) + Pr[X ≥ m/2]m Resolving for Pr[X ≥ m/2], it is as a minimum 2/(m+2). It follows that the estimated number of repetitions in the above algorithm is as a maximum (m+2)/2; therefore the algorithm shots in expected polynomial time, & always yields a cut of size no less than m/2. We remark that, had we basically specified our approximation by way of “pick a random cut & stop,” we would state that the algorithm goes in linear time, & has an estimated approximation ratio of 1/2.

5.5.2. Random MAX-SAT Solution Earlier, we studied a greedy method for MAX-SAT which was guaranteed to gratify half of the clauses. Here we will study MAX-Ak-SAT, the limitation of MAX-SAT to CNF principles with as a minimum k literal per clause. Our algorithm is similar to the one for MAXCUT: Choose an arbitrary assignment to the variables. It is easy to indicate, using an analogous

Introduction To Algorithms

146

analysis to the above notion, that the estimated approximation ratio of this technique is at least 1− 1/2k. More specifically, if m is the number of clauses in a formulary, the expected number of clauses gratified by an arbitrary assignment is m − m/2k. Let c be a random clause having k literals. The probability that every one of its literals was fixed to a value that marks them false is as a maximum 1/2k since there is a possibility of 1/2 for each literal & there are as a minimum k of them. Thus the probability that c is gratified is at least 1−1/2k. Using a linearity of probability argument (such as in the MAX-CUT analysis) we conclude that as a minimum m − m/2k clauses are estimated to be satisfied.

5.6. A TOUR OF APPROXIMATION CLASSES We will now take a stride back from our algorithmic debates, and concisely define a few of the common intricacy classes linked with NP optimization problems.

5.6.1. PTAS and FPTAS PTAS and FPTAS are categories of optimization problems that few believe are nearer to the proper description of what is efficiently solvable, instead of merely P. This is for the reason that problems in these two classes can be approximated with constant ratios subjectively close to 1. However, with PTAS, while the approximation ratio gets nearer to 1, the runtime of the analogous approximation algorithm may increase exponentially with the ratio. More formally, PTAS is the category of NPO problems Π which have an approximation scheme. That is, assumed ɛ > 0, there is a polynomial time algorithm A so that If Π is a maximization issue, A is a(1 + ɛ) approximation, that is, the ratio reaches 1 from the right. ii. If Π is a minimization issue, it is a(1 − ɛ ) approximation (the ratio reaches 1 from the left). As we mentioned, one disadvantage of a PTAS is that the algorithm (1 + ɛ) could be exponential in 1/ɛ. The class FPTAS is basically PTAS but with the extra condition that the runtime is polynomial in n & 1/ɛ for the approximation algorithm. i.

Approximation Algorithms

147

5.6.2. A Few Known Results for PTAS and FPTAS It is known that few NP-hard optimization problems can’t be approximated subjectively well unless P = NP. One instance is a problem we observed at earlier, Minimum Bin Packing. This is a rare instance in which there is a modest proof that unless P = NP, the problem is not approximable. Minimum Bin Packing is not in PTAS unless P = NP. In fact, there is no 3/2 − ɛ approximation for any ɛ > 0, unless P = NP: To prove the outcome, we employ a reduction as of the Set Partition decision problem. Set Partitioning asks if an assumed set of natural numbers could be split into two sets which have an equal sum. Set Partition: Case: A multi-set S = {r1,... ,rn}, in which ri ∈ N for all i = 1,... ,n.

Solution: A splitting of S into sets S1 & S2; i.e., S1 ∪ S2 = S & S1 ∩ S2 = ∅. Measure: m(S) = 1 if ∑ri∈S1 ri = ∑∑ri∈S2 rj, & m(S) = 0 otherwise.

Proof. Suppose S = {r1,... ,rn} is a Set Partition case. Decrease to Minimum Bin Packing by letting (half the sum of elements in S), & considering a bin packing case of items S’ = {r1/C,... ,rn/C}.

If S can be divided into two sets of the identical sum, then the minimum quantity of bins necessary for the analogous S’ is 2. Conversely, if S cannot be divided in this manner, the minimum number of bins required for S0 is at least 3, since every possible partitioning produces a set with a total greater than C. Hence, if there existed a poly-time (3/2 −ɛ)-approximation algorithm A, it might be used to resolve Set Partition: i.

If A (given S & C) yields a solution using as a maximum (3/2− ɛ)2 = 3−2 bins, then there is a Set Partition for S. ii. If A yields a solution using as a minimum (3/2 − ɛ)3 = 9/2 − 3 = 4.5 − 3 bins, then there isn’t any Set Partition for S. However for any ɛ ∈ (0, 3/2), 3 − 2 < 4.5 − 3

Consequently, this polynomial time algorithm differentiates between that S that may be partitioned & those that cannot, hence P = NP.

A similar result holds for issues such as MAX-CUT, MAX-SAT, & Minimum Vertex Cover. However, unlike the outcome for Bin Packing, the evidence for these appear to need the outline of probabilistically checkable proofs.

Introduction To Algorithms

148

5.6.3. APX APX is a (presumably) larger category than PTAS; the approximation promises for problems in it are severely weaker. An NP optimization issue Π is in APX only if there exists a polynomial time algorithm A & constant c so that A stays a c-approximation to Π.

5.6.4. A Few Known Results for APX It is easy to observe that PTAS ⊆ APX ⊆ NPO. When one sees new intricacy classes & their inclusions, one of the primary questions to be requested is: How probable is it that these inclusions might be made into equalities? Unluckily, it is highly unlikely. The following relationship can be revealed between the three approximation categories we have seen. We can assume that PTAS = APX⇐⇒ APX =NPO ⇐⇒ P = NP. Thus, if all NP optimization problems can be approximated inside a constant factor, at that point P = NP. Further, if all problems which have constant approximations may be subjectively approximated, still P = NP. Another way of putting this is: if NP problems are difficult to solve, then few of them are difficult to approximate as well. Moreover, there is a “hierarchy” of successively difficult-to-approximate problems. One of the directions specified follows from a theorem of the prior section: earlier, we observed a constant factor approximation for Minimum Bin Packing. However, it does not possess a PTAS unless P = NP. This shows the course PTAS = APX ⇒ P = NP. One example of a problem which cannot be in APX until P = NP is the well-identified Minimum Traveling Salesman problem. Minimum Traveling Salesman is described below: Case: A set C = {c1,... ,cn} of cities, & a distance function d : C × C → N.

Solution: A path through the cities, that is, a permutation π : {1,... ,n} → {1,... ,n}. Measure: The cost of visiting cities relating to the path, i.e. n −1

∑ d (cπ i =1

(i )

, cπ (i +1) )

It is important to observe that when the spaces in the problem instances constantly obey a Euclidean metric then Minimum Traveling Salesperson possess a PTAS (Arora, 1998). Thus, we can say that it is the simplification of possible distances in the aforementioned problem that makes it hard to

Approximation Algorithms

149

approximate. This is often the issue with approximability: a small limitation on an inapproximable problem may suddenly make it a highly approximable one.

5.7. BRIEF INTRODUCTION TO PCPS In the 1990s, the effort in probabilistically checkable proofs (PCPs) remained the major breakthrough in demonstrating hardness results, and possibly in theoretical computer science altogether. In essence, PCPs simply look at a little bit of a proposed proof, by randomness, but manage to arrest all of NP. As the number of bits checked by them is so small (a constant), while an efficient PCP occurs for a given problem, it infers the difficulty of approximately solving the similar problem as well, inside some constant factor. The notion of a PCP ascended from a series of contemplations on proofchecking via randomness. We know NP signifies the class of problems which have “short proofs” we can prove effective. As long as NP is concerned, entirely all of the verification completed is deterministic. When a proof is incorrect or correct, a polynomial time verifier replies “yes” or “no” with 100% sureness. However, what ensues when we relax the idea of total correctness to involve probability? Suppose we allow the proof verifier to toss impartial coins, and have a one-sided error. To be exact, now a randomized verifier only agrees to a correct proof having probability at least 1/2, yet still rejects any unfitting proof it reads. (We call it a probabilistically checkable proof system, that is, a PCP.) This slight alteration of what it means to substantiate a proof leads to an incredible characterization of NP: all of the NP decision problems may be verified by a PCP from the above type, which only tosses O(log n) coins and only checks a constant (O(1)) figure of bits of any particular proof! The result involves the production of highly complex errorcorrecting codes. We shall not debate it on a formal level now but will cite the aforementioned in the notation of a theorem.

150

Introduction To Algorithms

5.8. PROMISING APPLICATION AREAS FOR APPROXIMATION AND RANDOMIZED ALGORITHMS 5.8.1. Randomized Backtracking and Backdoors Backtracking is one of the first and most natural methods employed for solving combinatorial problems. Generally, backtracking deterministically may take exponential time. Recent work has established that many realworld problems could be solved quite rapidly, once the selections made in backtracking are randomized. Particularly, problems in practice have a tendency to have minor substructures within them. These substructures have the tendency that once they are solved appropriately, the entire problem can be solved. The presence of these so-called “backdoors” (Williams et al., 2003) to problems mark them very rational to a solution using randomization. Coarsely speaking, search heuristics will mark the backdoor substructure first in the search, with a substantial probability. Therefore, by repeatedly resuming the backtracking mechanism after a definite (polynomial) length of time, the total runtime that backtracking requires discovering a solution is decreased tremendously.

5.8.2. Approximations to Guide Complete Backtrack Search A promising method for solving combinatorial problems by complete (exact) methods draws on latest results on some of the finest approximation algorithms centered on linear programming (LP) relaxations (Chvatal, 1979, 1983, Dantzig, 2016) & so-called randomized rounding methods, as well as on outcomes that revealed the extreme inconsistency or “unpredictability” in the complete search procedures’ runtime, often explained by professed heavy-tailed cost distributions (Gomes et al., 2000). Gomes and Shmoys (2002) suggest a complete randomized backtrack search technique that tightly combines constraint satisfaction problem (CSP) propagation methods with randomized LP-based approximations (Shmoys, 1995). They use as a standard domain a virtuously combinatorial problem, the quasi-group (or Latin square) completion problem (QCP). Each instance involves an n by n matrix having n2 cells. A complete quasi-group contains a coloring of each cell using one of n colors such that there is no repetitive color in any column or row. Given an incomplete coloring of the ‘n’ through n cells, defining whether there is a valid accomplishment into a full quasi-group in an NP-complete problem (Colbourn, 1984). The underlying structure of

Approximation Algorithms

151

this standard is similar to that originated in a series of practical applications, such as fiber optics routing, experimental design, and timetabling problems (Laywine & Mullen, 1998; Kumar et al., 1999). Gomes and Shmoys compare their outcomes for the hybrid CSP/LP strategy steered through the LP randomized rounding approximation using a CSP strategy and with an LP strategy. The results indicate that the hybrid approach considerably improves over the pure approaches on hard instances. This proposes that LP randomized rounding approximation gives powerful heuristic regulation to the CSP search.

5.8.3. Average Case Complexity and Approximation While “worst case” complexity partakes a very rich theory, it frequently feels too restrictive to be pertinent to practice. Maybe NP-hard problems are hard just for some esoteric sets of cases that will hardly ever ascend. To this end, researchers have suggested theories of “average case” complexity, that attempt to probabilistically explore problems based on randomly selected instances over distributions; for an overview to this line of work, cf. (Gurevich, 1991; Nowakowski & Skarbek, 2006). Lately, an exciting thread of theoretical research has explained the connections between the averageinstance complexity of problems & their approximation hardness (Feige, 2002; Wilkinson, 2003; Beier et al., 2007). For instance, it is presented that if random 3-SAT is difficult to solve in polynomial time (given reasonable definitions of “random” and “hard”), then NP-hard optimization problems, for example, Minimum Bisection is difficult to approximate in the worst instance. Conversely, this implies that better approximation algorithms for some problems might lead to the average-instance tractability of others. A natural research query is: does a PTAS suggest average-instance tractability or vice versa? We suspect that some proclamation of this form might be the instance. In our defense, the latest paper illustrates that Random Maximum Integer Knapsack is precisely solvable in expected polynomial time (Beier & Vocking, 2003; 2004; 2006).

5.9. TRICKS OF THE TRADE One major initial incentive for the learning of approximation algorithms was to deliver a new theoretical avenue for coping and analyzing with hard problems. Faced with a brand-new fascinating optimization problem, how could one apply the techniques deliberated here? One possible scheme continues as follows:

Introduction To Algorithms

152

i.

First, try to substantiate your problem is NP-hard, otherwise, find proof that it is not! Possibly the problem admits an exciting exact algorithm, without the requirement for approximation. ii. Often, a very intuitive and natural idea is the base of an approximation algorithm. How good is a randomly picked possible solution for the problem? (What is the anticipated value of a random solution?) What about a greedy strategy? Can you define a region such that local search does fine?. iii. Seek for a problem (name it Π) that is similar to yours in some sense, and use a present approximation algorithm for Π to get an approximation for your problem. iv. Try to ascertain it cannot be approximated finely, by reducing few hard-to-approximate problems to your problem. The first, third, and fourth points essentially pivot on one’s resourcefulness: one’s persistence to scour the literature (and colleagues) for problems related to the one at hand, in addition to one’s ability to see the relationships and reductions which indicate that a problem is indeed analogous. This chapter has been mostly concerned with the second point. To answer the queries of that point, it is critical to proving limits on optimal solutions, regarding feasible solutions that one’s methods obtain. Regarding minimization (maximization) problems, one will have to prove lower limits (respectively, upper limits) on some optimal resolution for the problem. Devising lower (or upper) limits can simplify the proof greatly: one only needs to indicate that an algorithm yields a solution with value as a maximum c time the lower limits to indicate that the algorithm is a c-approximation. We have proven upper and lower bounds repeatedly (explicitly or implicitly) in our verifications for approximation algorithms during this chapter—it may be informative for the reader to analyze each approximation proof & discover where we have done it. For instance, the greedy vertex cover algorithm (for choosing a maximal matching) works for the reason that even an optimal vertex cover secures, as a minimum, one of the vertices in each verge of the matching. The number of edges in a matching is a lower limit on the total number of nodes in an optimal vertex cover, and hence the total nodes in the matching (that is twofold the number of edges) are, as a maximum, twofold the number of nodes in an optimal cover.

Approximation Algorithms

153

REFERENCES 1.

Adleman, L. M., (1980, October). On distinguishing prime numbers from composite numbers. In Foundations of Computer Science, 21st Annual Symposium on (pp. 387–406). IEEE. 2. Agrawal, M., Kayal, N., & Saxena, N. (2002). PRIMES is in P, IIT Kanpur. Preprint of August, 8, 2. 3. Aharoni, R., Erdös, P., & Linial, N., (1985, December). Dual integer linear programs and the relationship between their optima. In Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing (pp. 476–483). ACM. 4. Aho, A. V., & Hopcroft, J. E., (1974). The design and analysis of computer algorithms. Pearson Education India. 5. Aho, A. V., Dahbura, A. T., Lee, D., & Uyar, M. U., (1991). An optimization technique for protocol conformance test generation based on UIO sequences and rural Chinese postman tours. IEEE Transactions on Communications, 39(11), 1604–1615. 6. Aho, A. V., Hopcroft, J. E., & Ullman, J. D., (1976). On finding lowest common ancestors in trees. SIAM Journal on Computing, 5(1), 115– 132. 7. Aho, A. V., Hopcroft, J. E., & Ullman, J. D., (1979). Computers and intractability: a guide to NP-completeness, Freeman Publishing, vol. 1, pp. 10-16. San Francisco. 8. Aho, A. V., Hopcroft, J. E., & Ullman, J. D. (1983). Data Structures and algorithms. Addison. W esley Publishing Company, INc, 1, 983. 9. Aho, A. V., Sagiv, Y., Szymanski, T. G., & Ullman, J. D., (1981). Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM Journal on Computing, 10(3), 405–421. 10. Aistleitner, C., (2011). Covering numbers, dyadic chaining, and discrepancy. Journal of Complexity, 27(6), 531–540. 11. Alkalai, L., & Geer, D., (1996). Space-Qualified 3D Packaging Approach for Deep Space Missions: New Millennium Program, Deep Space 1 Micro-Electronics Systems Technologies. Viewgraph Presentation. Pasadena, CA: Jet Propulsion Laboratory, 1. 12. Alon, N., & Spencer, J., (2000). The probabilistic method. With an appendix on the life and work of Paul Erdos. Wiley-Intersci. Ser. Discrete Math. Optim., Wiley-Interscience, New York.

154

Introduction To Algorithms

13. Andersen, L. D., & Hilton, A. J. W., (1983). Thank Evans!. Proceedings of the London Mathematical Society, 3(3), 507–522. 14. Ansótegui, C., del Val, A., Dotú, I., Fernández, C., & Manya, F., (2004, July). Modeling choices in quasigroup completion: SAT vs. CSP. In AAAI (pp. 137–142). 15. Arora, N. S., Blumofe, R. D., & Plaxton, C. G., (2001). Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2), 115–144. 16. Arora, S., (1998). Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems. Journal of the ACM (JACM), 45(5), 753–782. 17. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., MarchettiSpaccamela, A., & Protasi, M., (1999). Complexity and approximation· Springer. Berlin, Heidelberg, New York. 18. Ausiello, G., Crescenzi, P., Gambosi, G., Kann, V., MarchettiSpaccamela, A., & Protasi, M., (2012). Complexity and approximation: Combinatorial optimization problems and their approximability properties. Springer Science & Business Media. 19. Ausiello, G., Marchetti-Spaccamela, A., Crescenzi, P., Gambosi, G., Protasi, M., & Kann, V., (1999). Heuristic methods. In Complexity and Approximation (pp. 321–351). Springer, Berlin, Heidelberg. 20. Banderier, C., Beier, R., & Mehlhorn, K., (2003, August). Smoothed analysis of three combinatorial problems. In International Symposium on Mathematical Foundations of Computer Science (pp. 198–207). Springer, Berlin, Heidelberg. 21. Becchetti, L., Leonardi, S., Marchetti-Spaccamela, A., Schäfer, G., & Vredeveld, T., (2006). Average-case and smoothed competitive analysis of the multilevel feedback algorithm. Mathematics of Operations Research, 31(1), 85–108. 22. Beier, R., & Vöcking, B., (2003, June). Random knapsack in expected polynomial time. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing (pp. 232–241). ACM. 23. Beier, R., & Vöcking, B., (2004, January). Probabilistic analysis of knapsack core algorithms. In Proceedings of the fifteenth annual ACMSIAM symposium on Discrete Algorithms (pp. 468–477). Society for Industrial and Applied Mathematics. 24. Beier, R., & Vöcking, B., (2006). Typical Properties of Winners and

Approximation Algorithms

25.

26.

27.

28. 29.

30.

31.

32. 33.

34. 35.

36.

155

Losers [0.2 ex] in Discrete Optimization. SIAM Journal on Computing, 35(4), 855–881. Beier, R., Röglin, H., & Vöcking, B., (2007, June). The smoothed number of Pareto optimal solutions in bicriteria integer optimization. In International Conference on Integer Programming and Combinatorial Optimization (pp. 53–67). Springer, Berlin, Heidelberg. Belanger, J., & Wang, J., (1993, May). Isomorphisms of NP-complete problems on random instances. In Structure in Complexity Theory Conference, 1993., Proceedings of the Eighth Annual (pp. 65–74). IEEE. Bennett, E. M., Cramer, W., Begossi, A., Cundill, G., Díaz, S., Egoh, B. N., & Lebel, L., (2015). Linking biodiversity, ecosystem services, and human well-being: three challenges for designing research for sustainability. Current Opinion in Environmental Sustainability, 14, 76–85. Berry, M. V., & Howls, C. J., (2012). Integrals with coalescing saddles. Chapter, 36, 775–793. Blass, A., & Gurevich, Y., (1990, October). On the reduction theory for average case complexity. In International Workshop on Computer Science Logic (pp. 17–30). Springer, Berlin, Heidelberg. Book, R. V., & Siekmann, J. H., (1986). On unification: Equational theories are not bounded. Journal of Symbolic Computation, 2(4), 317–324. Boyd, S. C., & Pulleyblank, W. R., (1990). Optimizing over the subtour polytope of the traveling salesman problem. Mathematical programming, 49(1–3), 163–187. Carlson, J. A., Jaffe, A., & Wiles, A. (Eds.)., (2006). The millennium prize problems. American Mathematical Soc. Černý, V., (1985). Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. Journal of optimization theory and applications, 45(1), 41–51. Chazelle, B., (2000). The discrepancy method: randomness and complexity. Cambridge University Press, Vol. 1, pp. 1-19. Chazelle, B. (2004). The discrepancy method in computational geometry. In In Handbook of Discrete and Computational Geometry, Vol. 1, pp. 1-10. Chazelle, B., & Liu, D., (2001, July). Lower bounds for intersection

156

37. 38.

39.

40. 41. 42. 43. 44. 45.

46. 47.

48.

Introduction To Algorithms

searching and fractional cascading in a higher dimension. In Proceedings of the thirty-third annual ACM symposium on Theory of computing (pp. 322–329). ACM. Chazelle, B., & Lvov, A., (2001). A trace bound for the hereditary discrepancy. Discrete & Computational Geometry, 26(2), 221–231. Chazelle, B., Rubinfeld, R., & Trevisan, L., (2001, July). Approximating the minimum spanning tree weight in sublinear time. In International Colloquium on Automata, Languages, and Programming (pp. 190– 200). Springer, Berlin, Heidelberg. Chen, Y. E., & Epley, D. L., (1970). Determination of Schedules for Multiprocessor Systems With Limited Memory. In Proceedings of the Symposium on Information Processing, April 28–30, 1969: Celebrating the Centennial Year of Purdue University (Vol. 1, p. 110). Engineering Experiment Station, Purdue University. Chvatal, V., (1979). A greedy heuristic for the set-covering problem. Mathematics of operations research, 4(3), 233–235. Chvatal, V. (1983). Linear programming. A Series of Books in the Mathematical Sciences, New York: Freeman, vol. 1, pp. 10-15. Colbourn, C. J., (1984). The complexity of completing partial Latin squares. Discrete Applied Mathematics, 8(1), 25–30. Cook, W., & Rohe, A., (1999). Computing minimum-weight perfect matchings. INFORMS Journal on Computing, 11(2), 138–148. Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C., (2001). Introduction to Algorithms, Sect. 22.5. Cueto, E., Sukumar, N., Calvo, B., Martínez, M. A., Cegonino, J., & Doblaré, M., (2003). Overview and recent advances in natural neighbor Galerkin methods. Archives of Computational Methods in Engineering, 10(4), 307–384. Dantzig, G., (2016). Linear Programming and Extensions. Princeton university press. Diening, L., Hästö, P., & Nekvinda, A., (2004). Open problems in variable exponent Lebesgue and Sobolev spaces. FSDONA04 Proceedings, 38–58. Dotú, I., Del Val, A., & Cebrián, M., (2003, September). Redundant modeling for the quasigroup completion problem. In International Conference on Principles and Practice of Constraint Programming (pp. 288–302). Springer, Berlin, Heidelberg.

Approximation Algorithms

157

49. Durand, A., Hermann, M., & Kolaitis, P. G., (2005). Subtractive reductions and complete problems for counting complexity classes. Theoretical Computer Science, 340(3), 496–513. 50. Edmonds, J., (1965). Maximum matching and a polyhedron with 0, 1-vertices. Journal of Research of the National Bureau of Standards B, 69(125–130), 55–56. 51. Faigle, U., Kern, W., & Turán, G., (1989). On the performance of online algorithms for partition problems. Acta Cybern., 9(2), 107–119. 52. Feige, U., (2002, May). Relations between average-case complexity and approximation complexity. In Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing (pp. 534–543). ACM. 53. Feller, W., (1971). An Introduction to Probability Theory and Its Applications, Volume II (Vol. 2). Wiley, New York. 54. Festa, P., & Resende, M. G., (2002). GRASP: An annotated bibliography. In Essays and surveys in metaheuristics (pp. 325–367). Springer, Boston, MA. 55. Galil, Z., (1974). On some direct encodings of nondeterministic Turing machines operating in polynomial time into P-complete problems. ACM SIGACT News, 6(1), 19–24. 56. Garey, M. R., & Johnson, D. S., (1976). Approximation algorithms for combinatorial problems: an annotated bibliography. Algorithms and Complexity: New Directions and Recent Results, 41–52. 57. Garey, M. R., & Johnson, D. S., (2002). Computers and Intractability (Vol. 29). New York: wh freeman. 58. Garey, M. R., Graham, R. L., & Ullman, J. D., (1972, May). Worstcase analysis of memory allocation algorithms. In Proceedings of the Fourth Annual ACM Symposium on Theory of Computing (pp. 143– 150). ACM. 59. Geman, S., & Geman, D., (1987). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. In Readings in Computer Vision (pp. 564–584). 60. Goemans, M. X., & Williamson, D. P., (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6), 1115– 1145. 61. Goldwasser, S., & Kilian, J., (1986, November). Almost all primes can

158

62.

63. 64.

65.

66. 67. 68. 69.

70.

71.

72.

73.

Introduction To Algorithms

be quickly certified. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing (pp. 316–329). ACM. Gomes, C. P., & Shmoys, D. B., (2002, March). The promise of LP to boost CSP techniques for combinatorial problems. In Proc., Fourth International Workshop on Integration of AI and OR techniques in Constraint Programming for Combinatorial Optimization Problems (CP-AI-OR’02), Le Croisic, France (pp. 25–27). Gomes, C. P., Selman, B., & Kautz, H., (1998). Boosting combinatorial search through randomization. AAAI/IAAI, 98, 431–437. Gomes, C. P., Selman, B., Crato, N., & Kautz, H., (2000). Heavytailed phenomena in satisfiability and constraint satisfaction problems. Journal of Automated Reasoning, 24(1–2), 67–100. Gomes, C., & Shmoys, D., (2002, September). Completing quasigroups or latin squares: A structured graph coloring problem. In Proceedings of the Computational Symposium on Graph Coloring and Generalizations (pp. 22–39). Graham, R. L., (1966). Bounds for certain multiprocessing anomalies. Bell Labs Technical Journal, 45(9), 1563–1581. Graham, R. L., (1969). Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17(2), 416–429. Graham, R. L., & Pollak, H. O., (1971). On the addressing problem for loop switching. Bell Labs Technical Journal, 50(8), 2495–2519. Guibas, L., Ramshaw, L., & Stolfi, J. (1983, November). A kinetic framework for computational geometry. In Foundations of Computer Science,1983., 24th Annual Symposium on (vol. 1, pp. 100-111). IEEE. Gurevich, Y., (1990, October). Matrix decomposition problem is complete for the average case. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 802–811). IEEE. Gurevich, Y., (1991, July). Average case complexity. In International Colloquium on Automata, Languages, and Programming (pp. 615– 628). Springer, Berlin, Heidelberg. Halperin, E., (2002). Improved approximation algorithms for the vertex cover problem in graphs and hypergraphs. SIAM Journal on Computing, 31(5), 1608–1623. Hermann, M., & Kolaitis, P. G., (1994, June). The complexity of counting problems in equational matching. In International Conference on Automated Deduction (pp. 560–574). Springer, Berlin, Heidelberg.

Approximation Algorithms

159

74. Hermann, M., & Pichler, R., (2008, June). The complexity of counting the optimal solutions. In International Computing and Combinatorics Conference (pp. 149–159). Springer, Berlin, Heidelberg. 75. Herr, D. G., (1980). On the history of the use of geometry in the general linear model. The American Statistician, 34(1), 43–47. 76. Ho, A. C., (1982). Worst case analysis of a class of set covering heuristics. Mathematical Programming, 23(1), 170–180. 77. Hochbaum, D. S., (1996). Approximation Algorithms for NP-Hard Problems. PWS Publishing Co. 78. Hochbaum, D. S., & Shmoys, D. B., (1987). Using dual approximation algorithms for scheduling problems theoretical and practical results. Journal of the ACM (JACM), 34(1), 144–162. 79. Holland, J. H., (1992). Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press. 80. Hopper, E. B. C. H., & Turton, B. C., (2001). An empirical investigation of meta-heuristic and heuristic algorithms for a 2D packing problem. European Journal of Operational Research, 128(1), 34–57. 81. Ibarra, O. H., & Kim, C. E., (1975). Fast approximation algorithms for the knapsack and sum of subset problems. Journal of the ACM (JACM), 22(4), 463–468. 82. Indrani, A. V., (2003). Some issues concerning Computer Algebra in AToM3. Technical Report, School of Computer Science, McGill University, Montreal, QC. 83. Jain, K., & Vazirani, V. V., (2001). Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and Lagrangian relaxation. Journal of the ACM (JACM), 48(2), 274–296. 84. Jiménez, P., Thomas, F., & Torras, C., (2001). 3D collision detection: a survey. Computers & Graphics, 25(2), 269–285. 85. Johnson, D. S., (1973, April). Approximation algorithms for combinatorial problems. In Proceedings of the Fifth Annual ACM Symposium on Theory of Computing (pp. 38–49). ACM. 86. Karp, R. M., (1975, February). The fast approximate solution of hard combinatorial problems. In Proc. 6th South-Eastern Conf. Combinatorics, Graph Theory and Computing (Florida Atlantic U. 1975) (pp. 15–31).

160

Introduction To Algorithms

87. Kaufman, M. T., (1974). An almost-optimal algorithm for the assembly line scheduling problem. IEEE Transactions on Computers, 100(11), 1169–1174. 88. Khot, S., (2002, May). On the power of unique 2-prover 1-round games. In Proceedings of the thirty-fourth annual ACM symposium on Theory of computing (pp. 767–775). ACM. 89. Khot, S. A., & Vishnoi, N. K., (2015). The Unique Games Conjecture, Integrality Gap for Cut Problems and Embeddability of Negative-Type Metrics into ℓ 1. Journal of the ACM (JACM), 62(1), 8. 90. Khot, S., & Regev, O., (2003, July). Vertex cover might be hard to approximate to within 2-/spl epsiv. In Computational Complexity, 2003. Proceedings. 18th IEEE Annual Conference on (pp. 379–386). IEEE. 91. Khot, S., Kindler, G., Mossel, E., & O’Donnell, R., (2007). Optimal inapproximability results for MAX-CUT and other 2-variable CSPs?. SIAM Journal on Computing, 37(1), 319–357. 92. Khuller, S. Approximation Algorithms for Finding Highly Connected Subgraphs. Vertex, 2, 2. 93. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P., (1983). Optimization by simulated annealing. science, 220(4598), 671–680. 94. Klein, P. N., & Young, N. E., (2010, February). Approximation algorithms for NP-hard optimization problems. In Algorithms and theory of computation handbook (pp. 34–34). Chapman & Hall/CRC. 95. Kozen, D. C., (1992). Counting Problems and# P. In The Design and Analysis of Algorithms (pp. 138–143). Springer, New York, NY. 96. Kumar, S. R., Russell, A., & Sundaram, R., (1999). Approximating Latin square extensions. Algorithmica, 24(2), 128–138. 97. LaForge, L. A., & Turner, J. W., (2006). Multi-processors by the numbers: mathematical foundations of spaceflight grid computing. In Aerospace Conference, 2006 IEEE (pp. 19). 98. LaForge, L. E., Moreland, J. R., & Fadali, M. S., (2006). Spaceflight multi-processors with fault tolerance and connectivity tuned from sparse to dense. In Aerospace Conference, 2006 IEEE (pp. 23). 99. Laywine, C. F., & Mullen, G. L., (1998). Discrete mathematics using Latin squares (Vol. 49). John Wiley & Sons. 100. Leahu, L., & Gomes, C. P., (2004, September). Quality of LP-based approximations for highly combinatorial problems. In International

Approximation Algorithms

161

Conference on Principles and Practice of Constraint Programming (pp. 377–392). Springer, Berlin, Heidelberg. 101. Lenstra, J. K., Shmoys, D. B., & Tardos, E., (1990). Approximation algorithms for scheduling unrelated parallel machines. Mathematical programming, 46(1–3), 259–271. 102. Liu, C. L., (1976, October). Deterministic Job Scheduling in Computing Systems. In Performance (pp. 241–254). 103. Lovász, L., (1975). On the ratio of optimal integral and fractional covers. Discrete mathematics, 13(4), 383–390. 104. Maurer, S. B., (1985). The Lessons of Williamstown. In New Directions in Two-Year College Mathematics (pp. 255–270). Springer, New York, NY. 105. McCrary, S. V., Anderson, C. B., Jakovljevic, J., Khan, T., McCullough, L. B., Wray, N. P., & Brody, B. A., (2000). A national survey of policies on disclosure of conflicts of interest in biomedical research. New England Journal of Medicine, 343(22), 1621–1626. 106. Miller, G. L., (1976). Riemann’s hypothesis and tests for primality. Journal of computer and system sciences, 13(3), 300–317. 107. Mossel, E., O’Donnell, R., & Oleszkiewicz, K., (2005, October). Noise stability of functions with low influences: invariance and optimality. In Foundations of Computer Science, 2005. FOCS 2005. 46th Annual IEEE Symposium on (pp. 21–30). IEEE. 108. Motwani, R., & Raghavan, P., (1995). Randomized Algorithms. Cambridge International Series on Parallel Computation. 109. Nemhauser, G. L., & Ullmann, Z., (1969). Discrete dynamic programming and capital allocation. Management Science, 15(9), 494–505. 110. Nemhauser, G. L., & Wolsey, L. A., (1988). Integer and Combinatorial Optimization. Interscience Series in Discrete Mathematics and Optimization. ed: John Wiley & Sons. 111. Nowakowski, A., & Skarbek, W., (2006, April). Fast computation of thresholding hysteresis for edge detection. In Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments IV (Vol. 6159, p. 615948). International Society for Optics and Photonics. 112. Papadimitriou, C. H., & Steiglitz, K., (1982). Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall: Englewood

162

Introduction To Algorithms

Cliffs NJ. Cited Figure, 2. 113. Paz, A., & Moran, S., (1977, July). Non-deterministic polynomial optimization problems and their approximation. In International Colloquium on Automata, Languages, and Programming (pp. 370– 379). Springer, Berlin, Heidelberg. 114. Podsakoff, N. P., Whiting, S. W., Podsakoff, P. M., & Blume, B. D. (2009). Individual-and organizational-level consequences of organizational citizenship behaviors: A meta-analysis. Journal of Applied Psychology, 94(1), 122. 115. Price, G. B., (1973). Telescoping Sums and the Summation of Sequences. The Two-Year College Mathematics Journal, 4(2), 16–29. 116. Pulleyblank, W. R., (1989). Chapter V Polyhedral combinatorics. Handbooks in operations research and management science, 1, 371– 446. 117. Qi, L., (1988). Directed submodularity, ditroids, and directed submodular flows. Mathematical Programming, 42(1–3), 579–599. 118. Rabin, M. O., (1980). A probabilistic algorithm for testing primality. Journal of number theory, 12(1), 128–138. Schrijver, A. (2003). Combinatorial optimization: polyhedra and efficiency (Vol. 24). Springer Science & Business Media. 119. Shmoys, D. B., (1995). Computing near-optimal solutions to combinatorial optimization problems. Combinatorial Optimization, 20, 355–397. 120. Solovay, R., & Strassen, V., (1977). A fast Monte-Carlo test for primality. SIAM Journal on Computing, 6(1), 84–85. Gomes, C. P., & Williams, R. (2005). Approximation algorithms. In Search methodologies (pp. 557–585). Springer, Boston, MA. 121. Vazirani, V. V., (2013). Approximation Algorithms. Springer Science & Business Media, vol. 1, pp. 5-18. 122. Williams, R., Gomes, C. P., & Selman, B., (2003, August). Backdoors to typical case complexity. In IJCAI (Vol. 3, pp. 1173–1178). 123. Reinelt, G., (1994). The Traveling Salesman: Computational Solutions for TSP Applications. Springer-Verlag, vol 1, pp. 1-21. 124. Röglin, H., & Teng, S. H., (2009, October). Smoothed analysis of multiobjective optimization. In Foundations of Computer Science, 2009. FOCS’09. 50th Annual IEEE Symposium on (pp. 681–690). IEEE.

Approximation Algorithms

163

125. Röglin, H., & Vöcking, B., (2007). Smoothed analysis of integer programming. Mathematical programming, 110(1), 21–56. 126. Ryser, H. J., (1951). A combinatorial theorem with an application to Latin rectangles. Proceedings of the American Mathematical Society, 2(4), 550–552. 127. Shaw, P., Stergiou, K., & Walsh, T., (1998, April). Arc consistency and quasigroup completion. In Proceedings of the ECAI-98 Workshop on Non-Binary Constraints (Vol. 2). 128. Smith, R. S., (1986). Rolle over Lagrange—Another shot at the mean value theorem. The College Mathematics Journal, 17(5), 403–406. 129. Spyke, N. P., (1998). Public Participation in Environmental Decision Making at the New Millennium: Structuring New Spheres of Public Influence. BC Envtl. Aff. L. Rev., 26, 263. 130. Stock, J. H., & Watson, M. W., (2001). Vector autoregressions. Journal of Economic Perspectives, 15(4), 101–115. 131. Toth, C. D., O’Rourke, J., & Goodman, J. E., (2017). Handbook of Discrete and Computational Geometry. Chapman and Hall/CRC. 132. Vangheluwe, H., Sridharan, B., & Indrani, A. V., (2003). An algorithm to implement a canonical representation of algebraic expressions and equations in AToM3. Technical Report, School of Computer Science, McGill University, Montreal, QC. 133. Vershynin, R., (2009). Beyond Hirsch conjecture: Walks on random polytopes and smoothed complexity of the simplex method. SIAM Journal on Computing, 39(2), 646–678. 134. Wallace, L., Keil, M., & Rai, A., (2004). Understanding software project risk: a cluster analysis. Information & Management, 42(1), 115–125. 135. Wang, J., (1995, May). Average-case completeness of a word problem for groups. In Proceedings of the twenty-seventh annual ACM symposium on Theory of computing (pp. 325–334). ACM. 136. Wang, Y., (2008). Topology control for wireless sensor networks. In Wireless sensor networks and applications (pp. 113–147). Springer, Boston, MA. 137. Wilkinson, M. H., (2003, August). Gaussian-weighted moving-window robust automatic threshold selection. In International Conference on Computer Analysis of Images and Patterns (pp. 369–376). Springer, Berlin, Heidelberg.

164

Introduction To Algorithms

138. Xu, W., (2005). Steven”: The Design and Implementation of the µModelica Compiler (Doctoral dissertation, MSc. Thesis. School of Computer Science, McGill University, Montreal, QC). 139. Zhu, X., & Wilhelm, W. E., (2006). Scheduling and lot sizing with the sequence-dependent setup: A literature review. IIE Transactions, 38(11), 987–1007.

CHAPTER

6

COMPARATIVE INVESTIGATION OF EXACT ALGORITHMS FOR 2D STRIP PACKING PROBLEM CONTENTS 6.1. Introduction .................................................................................... 166 6.2. The Upper Bound ........................................................................... 167 6.3. Lower Bounds For 2SP..................................................................... 168 6.4. A Greedy Heuristic For Solving The 2D Knapsack Problems............. 173 6.5. The Branch And Price Algorithms..................................................... 175 6.6. The Dichotomous Algorithm............................................................ 176 6.7. Computational Results..................................................................... 177 References.............................................................................................. 187

166

Introduction To Algorithms

6.1. INTRODUCTION The 2D strip packing problem (i.e., 2SP) is a renowned problem involving combinatorial optimization. It has numerous commercial applications which include the cutting of textile materials or paper rolls. Additionally, some approximation and exact algorithms are applied to solving problems involving Bin Packing. However, one phase of these algorithms aim at solving strip packing problems (Chung et al., 1982; Berkey & Wang, 1987). Consider a collection of n rectangular components. Each component i has height hi and width wi (where, i ∈ {1,..., n}). The 2SP involves packing of all components in a strip with W width and infinite height. Considering zero loss of generality, we can assume the dimensions of the components and the strips as integers. The main goal is to minimalize the total packing height of the components without overlapping. The location of the items is expected to be fixed, i.e., no rotation (Vanderbeck, 1994; 1999; 2000). This kind of problem is categorized under nondeterministic polynomial time hardness (i.e., NP-hardness) (Garey & Johnson, 1979; Martello et al., 2003). Therefore, most of the literature considering the 2SP problems deals with approximation as well as exact algorithms. Application of approximation algorithms for solving the strip packing problems has been discussed by several researchers (Kenyon & Remila, 1996; Fernandez de la Vega & Zissimopoulos, 1998; Lesh et al., 2003; Zhang et al., 2006). On the other hand, Beltran et al. (2004), Bortfeldt (2006) and Gomes and Oliveira (2006) employed meta-heuristics for solving strip packing problems. Hopper and Turton (2001) presented a brief introduction of the meta-heuristic algorithms for 2D strip packing applications. A little literature is available on the study of exact algorithms for solving 2SP problems. For instance, Hifi (1998) discussed the packing/cutting problem involving the guillotine cut. He proposed two distinct algorithms involving branch and bound procedures in conjunction with dichotomous search. Later, Martello et al. (2003) presented the use of a unique lower bound in branch and bound algorithms to resolve the problems without guillotine restraint. Recently, some literature has been published on the introduction of the Strip Packing problem involving guillotine constraints. For instance, Cui et al. (2006) presented a recurring branch and bound algorithm for obtaining an approximate solution. Bekrar et al. (2006) proposed a novel lower bound and branch algorithm for the guillotine problems. Cintra et al. (2006) employed the column generation approach and dynamic programming for solving different variants of the problem.

Comparative Investigation of Exact Algorithms for 2D Strip Packing....

167

This chapter deals with the investigation of the 2D strip packing problems involving non-guillotine pattern. Three different types of exact algorithms, including branch and price method, branch and bound algorithm and dichotomous search, are explored in this chapter. The subsequent sections of the chapter include details regarding the use of the lower and upper bound in branch and bound methods (De la Vega & Zissimopoulos, 1991). The role of the column generation has also been discussed in the following sections of the chapter. A particular branching scheme to attain an optimal solution has been discussed in the branch and price method (Clautiaux et al., 2006; 2007; 2008; 2009). Branch and bound algorithm is the primary exact method which has been discussed in the below section (Bekrar et al. 2006). This kind of algorithm is typically suitable for the guillotine case. However, the use of the branch and bound algorithm for solving the non-guillotine case has been illustrated in the below section. The upper and lower bound along with the concerned branching schemes are briefly discussed in the following sections.

6.2. THE UPPER BOUND Several researchers have discussed approximation algorithms for solving 2D Strip Packing Problems, with or without involving guillotine constraint (Kenyon & Remila, 1996; Fernandez de la Vega & Zissimopoulos, 1998; Lesh et al., 2003; Zhang et al., 2006). This section discusses a certain type of upper bound exact algorithms (e.g., Shelf Heuristic Filling) for solving the bin and strip packing problems with guillotine constraint. This kind of heuristic was formerly suggested by Ben Messaoud et al. (2003). Bekrar et al. (2006) proposed some approaches (e.g., BSHF) to improve the strip packing algorithms. BSHF (the best Shelf Heuristic Filling) is a simplification of the well-recognized 2D level algorithm (Monaci & Toth, 2006; Hadjiconstantinou & Iori, 2007) (Figure 6.1).

Figure 6.1: Available rectangles and associated available points,

168

Introduction To Algorithms

(https://pdfs.semanticscholar.org/8067/fc5f7d641d6c19ce6f7748563b9433cff369.pdf).

The basic concept of this type of algorithm deals with the exploitation of the unused area in each shelf. This algorithm makes it feasible to pack components over each other in the same shelf (i.e., which is typically not feasible in the level algorithms. In Shelf Heuristic Filling (SHF) algorithm, the packing of items is carried out in the available rectangles having available points. Such points can be present at the top-left or the bottom-right corner of an already packed item (Lai & Chan, 1997; Fekete & Schepers, 2004; Baldacci & Boschetti, 2007). Items are typically sorted in a decreasing order of height. The packing of the first item (i.e., the tallest one) is normally carried out in the first available rectangle (i.e., the lowest). The item present in the leftmost corner initializes a shelf having a height equal to the item height. After packing of every item, the updating of the available rectangles takes place (Frenk & Galambos, 1987; Coppersmith & Raghavan, 1989). The updating entails the dimension reductions in the available rectangles which exhibits the overlapping with the packed components. The packing of items generates two fresh available rectangles. This process is repeated until the packing of the last item (Coffman et al., 1980; Remila, 1996). Guillotine version of the upper bound exact algorithms presents different rules for sorting items. The Best Fit rule (most suitable approach) for packing items is typically employed with a possibility to update the available rectangles’ list. This kind of algorithms offer excellent results for large instances in a few seconds, and the average wastage rate is approximately 9% (Csirik & Van Vliet, 1993; Lodi et al., 1999; 2002).

6.3. LOWER BOUNDS FOR 2SP This section deals with the use of the lower bound in the exact algorithms. There are three major variants of lower bounds employed for the strip packing problems which include the continuous lower bound (i.e., Lc), first lower bound (Lmmv1) by Martello et al., (2003) and second lower bound (Lmmv2) by Martello et al., (2003). The exact algorithms presented in this chapter involve a new lower bound which is denoted by LBKCS.

Comparative Investigation of Exact Algorithms for 2D Strip Packing....

169

6.3.1. Continuous Lower Bound (Lc) Splitting of each item (i.e., j) into wjhj units results in attainment of a lower bound Lc which known as continuous lower bound as expressed below:

 n   ∑ wi hi   Lc =  i =1  W   



where, L0 = max {Lc, maxi=1...n hi}. The absolute worst performance ratio for this bound is typically equal to 0.5, as proved by Martello et al. (2003).

6.3.2. First Lower Bound Lmmv1 by Martello et al. (2003) Martello et al. (2003) presented the first lower bound which is an extension of the algorithm suggested by Martello and Toth (1990) for 1D bin packing problems. The basic concept is to disintegrate the set of components into three subsets with respect to their dimensions. Assume that J is the set of items. where, α ∈ [1, W/2] i.

J1 = { j ∈ J: wj > W-α } (i.e., the subset of significantly large components). ii. J2 = { j ∈ J : W -α≥ wj > W/2} (i.e., the subset of moderate-sized components/items). iii. J3 = { j ∈ J : W/2 ≥ wj ≥ α } (i.e., the subset of small-sized components/items). On the other hand, the components fulfilling the condition, wj