*468*
*168*
*4MB*

*English*
*Pages [163]*
*Year 2008*

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

��������� ���� ����

������������ �������� ����������

��������� ����� ��������� ������ �� �������

�

������������

����� �� ���� �������� ��������� ������ � ���� ��������� ���� �������� � ���� ������� �����������

S

���� �� � �������� ������ �� �������� ����� �� ���� � ���� ��������� ���� ������ � ��� ����� ���� ���� �������

s

k(s)

�� ��������� � ���

��� ����� �� ������ ����� ���������� �� ���

s�

�� ��������

����� ���������� �� ���� �������� ���� �� ����

S� k(s� )

•

������ ������ � ��� �������

•

����������� ��������� �� �������

•

������������ ��������� ��� ����� �� � ��� �� ���� �������

���� � ���

s∗

��

S

��

S

������ ������� ��� ��� ��������

s

s∗

����

S

�� � ���� ������� ������

��� ���������� ������ ��� ������ ��� ���� �������������� �� �������� ������ ��� �� �������� �� ��� ������� �� ��� ��������� ����� ��������� ������������� �������� ����� ��� ������� �������� �����

���

������������� �������� ����� �������

�� ��� ����� � �������� �����

G = (V, E)�

���� ������

s∈V

��� � ������ ��������

l : E → R+ v∈V

��� ����� ������� ���� �� ������ ���� ��� ������� ��� ������������ ���� ��� ���� ������ ���� �� ������� ��� ������

ds (v)

�� ��� �������� ���� ����

s

��

�� ��

v�

��� ��������� �������� ��� ���� ������� �� ���������� ���������� ��� ��������� ��� �� �������� � �������� ����� ���������� ���� ������

S = V � k(s) = 0 �� �� ���� ��

S

���

������� � ������

•

��� ���� ������������

u

����

S

���� ������� ���� ������

v ∈ S �� u� min{k(v), k(u) + l((u, v))}�

�� ���� ���������� �������� ��� ��

S�

�� �������� ��

G

���� ����

k(v)�

����������

�� ���������

•

��

S

k(v) = +∞�

k(u)

�� ��� ����� ��

ds (u)�

�� ������ ����� �������� ��������� ��� ���

k(v)

��

v

��

k(v) ���������� ��� ������ �� ��� �������� ���� ���� s �� v ����� ���� ������������ ��� ���������� ds (v) ���� ���������� ��� ��������� ��� �� ������� �� ������

��� �������� ������

���

������� �������� ���� �������

����� �� ���������� ����� �������� ���� ��

G

G = (V, E)

��� � ������ ��������

w:E→R

�� ������ �� ����� ���� �

�� ������� ������� ������������� ��� �� ��� ��������� ��������� �� ���� ������� �

������ ��������� � �� ���� ������� �� ��� �������� �� ���������� ��������� ��� ������������� �������� ���� �������� ��� ��������� �� �� ��������

������

S �� �������� �� G ���� ���� k(v) ��� � ���� T = ∅� S = V � k(s) = 0 ��� ���� ��������� ������ s ��� k(v) = +∞

�� �������� � �������� ����� ���������� ���� ������

T

��������

V \S �

����������

v= � s�

���

�� �� ���� ��

•

S

�� ���������

������� � ������

u

����

S

���� ������� ���� ��

u �= s

• ���

��� ���� ��������

v∈ / S �� u� �� ������ �����

�������� ���������

u

��

T

�� ������

k(v) �� �� min{k(v), w((u, v))}�

������ �� �������� ����� ����������

�� ���� ��� ����� ��� ����������� �� ��� �����������

�������� ��� ��� ���� �� ��� ������� ����

�������� ���� �� ���� �� ������ �� ������ ���� �� ���� ����� ��� ��������� ����

|V |

T ��� k(u)��

����� ����������� ��� ��

������������� ���� ����� ��� �������������� ���� ����������

|V |

������ �����������

����������� ���������� ���� ����� ���� ���� ��� ������� �� ���� ��� ������������ ���������� ��

����

|E|

�

������ �����

������������ �����������

� �� �������� ������ ��� ������ ������ � ������ ����

��� ��������� ��������������

����� ����� ���������� �� �������� �� ��� ���

•

�� �� ������ �������� ���� ��

T

��� �����

S

��� ��� ��� �����������

h

���� �� ��� �������

2i

T

�� � ������ ����

�������� �� �����

i

��

i 0}� ������������ ��� �������� ���� �� ��� ������ �� E ���� �������� ����� ����� �������

����� �� ��� ���� � �������� ���������� ���� ����� ��� �������� �� � �������

G�

��� �������� ����� ��� ��� �������� ��������� �� ��� ���� � ���

�������� ��� �������� ��������

��������� � ��� �������� �������

Gf �� ��� ������� G ���� ������� �� f �� ��� ������� ����� �� ��� ����� Gf = (V, Ef ) �������� ���� ��� �������� �������� uf �

��� �������� ������� �� ���� �� ���������� �� ���� ������ � ��� �� ��� �������� ��� �� �� ���� �� ������� � ������� ���� �� ���� �� ��� �������� ������� ���� �� ���� ���������� �����

��������� � �� ���������� ���� �� G ���� ������� �� f �� � �������� ������ ���� ���� ��� ������ s �� ��� ���� t �� ��� �������� ������� Gf � �� ����� ��� ��������� �� �� ���������� ���� �� � �������� ������� ��� � ����� ��� ��������� ���� ��� ��� �� ��� �������� �� �� ����� �� ��� ��������� ������

����� � �� � �������� ������� Gf ��� �� ����� ��� ���������� ���� P � ���� f �� ��� � ������� ����

������

�� ���������� ��� �������� �������

���� ������� ��

f�

���� ��� �� ��� �������� ������� ������������ ��

Gf

�������� ���� ����� ���� �������� �������� ��������

���������� �� ���������� ����

G�

P

��

Gf

�� � ���� ������� ����� �� ��� ����

��� ��� ���������� ������ �� ��� �� ����� ������� �� ���

P� ������

���� ���������� �������� ��� �������� ����� ��

�(P ) = min uf (v, w). (v,w)∈P

������� ����

�(P ) > 0�

�������

f�

����� ��������� ��� ���

P ⊂ Ef

�� ����

P

�� � ����� ��� �� �������� ���� ��������

����� ��

⎧ ⎪ ⎨f (v, w) + �(P ) � f (v, w) = f (v, w) − �(P ) ⎪ ⎩ f (v, w) f�

���� ����

f

���

�� �������� ��� ��� ��� ����������� ���

G�

�� ��

(v, w) ∈ P � (w, v) ∈ P �

����������

���������

|f � | = |f | + �(P ) > |f |�

�� ���� ���

�

�� ��� � ������� ����

����� ����� � ��� ��� ������������ ������ �� ����� ��� ��� ������ ������ ��������

����� �� ������� �� �������

Gf

�������� ��� ���

Gf �

S

�� ����������

�� ��

s−t

���

f

�� � ��� �� ������� ����� ���

��� �� ���������� ����� ������ �� �� ���� ���� �� ��������

s ∈ S�

v∈V

���������

f

G = (V, E)�

�� ����� �� ��� ��������

����� ��� �� �� ������� ������

���� ���� ����� ������ � �������� ���� ���� ��� ������

Gf

��� �� ���������� ����� �� ����

t �∈ S �

s

����������

�� v �� (S : S)

����

uf (v, w) = 0 ��� ��� (v, w) ∈ (S : S)� �� ���������� uf (v, w) = u(v, w) − f (v, w)� f (v, w) = u(v, w) ��� ��� (v, w) ∈ (S : S)� ����� �� ��� ������� ���� � � |f | = f (v, w) = u(v, w) = u(S : S).

��� ������ ���� �� ����

(v,w)∈(S:S)

(v,w)∈(S:S)

��� ������������ ����� ����� �� ���� ��� ����� �� ��� ��� �� ����� ������� �� ��� �������� �� ���

s−t

���� �� �� ��� �������� ����

max |f | = min u(S : S). f

(S:S)

� �� ��������� ��� �� ��� ������� �� ��� ��������� ��������

������� � ��������� ������� �������� ��� G �� � ������� ��� f �� � ��� �� G� �����

��� ��������� ���������� ��� ����������� �� f �� � ��� �� ������� ������ �� Gf ��� �� ���������� ����� ���

�� |f | = u(S : S) ��� ���� s − t ��� (S : S)�

������

�� ����� ��� ����������� �� ��� ���������� �� ������� ����

(1) ⇒ (2) ⇒ (3) ⇒ (1)�

����

���

• (1) ⇒ (2)�

���� ����������� �� ��� �������������� �� ��� ����������� ������ �� ����� ��

• (2) ⇒ (3)�

���� ����������� ������� ���� ��� ����� �� ��� ������ ������ ��������

• (3) ⇒ (1)�

���� ����������� ������� ���� ��� ���� ������� ������

� ������

�

��� �������������� ���������

�� ���� ���� ��� ��������� ���� ��� �������� ������� ������� �� ������ �� ���������� ������ ��� �������������� ���������� �� ������� ��� ������� ��� �� � ������� ���� ������ ��� ���� �� ����� ��������� �� ���� ������� �� ���� �� ����� �� �� ���������� ���� �� ��� �������� ������� �� ���� ���� ��� ����� ���� ���� �� ��� �������� �������� ���� ���� �� ����������� �� ���������� ������

��������������(G)

� � �

����� ���� � ���� ���

f

��� ��� �������� ����

����� Gf ��� �� ���������� ���� P

�� ���� �(P ) ���� ����� �� ��� ������� P � �� ���� |f | ← |f | + �(P )

������ �� ������� ��� ���� ����� �� ���������� ����� ��� ��� ������ ���� ���� �� �� ���������� �� ���� ��� ��������� ���� ����� �� �� ����� �� ���� ���� ��� ���������� ���� �� ��� �������� �������� ����� ��� ������ �� ������� ��� ��� ���� ��� �������� ����� ��� ����������� ��� ������� ���� �� ��� ���������� �� �������� ����� ������

���� ��

������ ���� ��� �������� ��������

u

��

G

��

�������

�������

���� �� ��� ���� ���

��������� ������������� �� �� ����� ��������� �� ��������������� ��� ���

f

�� ������� ������� ��� ��������� �� ��� ���

�������� ����������� ������� ���� �� ��� ���� �� ��� ��������� ���� ���� �� ���������� �����

�(P )

f = 0�

��� �� ����������

�� ��� ������� �� � ��� �� �������� �������� ��� ���� � ��������

�������� ��� ��������� ��� ��������� ��� ����� �� ������������ �� ���� ������� ������� ������������ �����

�(P ) ≥ 1

������ � �������� �������� ��� ����� ��� ����������� ����� ����

���� ��� ������� ��� ������ �� ������ �� ������� ���� ��� �������������� ������ ������ �� ����� ��� ��������� ����� ��� ����� ������������ ��� �� ������� ������� ��� ������� ���

���� ��� �� ��� ���������� �� � ������� ��� �������� ���� ����� �� � ������� ��� ���� �� ���� ��������� ���� �� � ���� ������ �������� ���� ��� ���� ������ ���� ���� �� ������� �������

������������� ��� ���� ����������� �� ��� ����������� ��������� �������� �� �� ���� ��� �� ��� ���� �������� �� ��� ������ �� ���������� �� ������� ��

N (v)}�

���� ����

U

���

|f | ≤ |N (s)|U ≤ nU �

��� �� ���������� �� ��� ���� �� G�

�����

U = max{u(s, w) : w ∈

�� ����� ������ � ����� �� �������

�� � ����� ����� �������������� ����� ����������� ���� �� ����� ��� ������ ��� ������ ����� ��������� ����� ���� ��� ������ �� ��� ����� ��� ��������� ����� ������ ������������� ��� ���������� ��� ��� ����� �� ���������� ������ �� ���� � ����� ��� ��������� ���� ���� O(2L ) ���� �� ���������� ����� �� ���� � ������ ������ �� ������ ��� ���������� �����

���� ��

������ ���� ��� �������� ��������

u

��

G

��

�������� �������

����� � ������� ����������

�� ��� ��� ������� ��� �� ���� � ����� ���� �������������� ������ ������ ���� ��� ����� �� ��� ������� ��� �� ��������� ��� ���� ����� ������ �� ������� �� � ������� ��� ����� ��� ������� ���� �� ������������ ��� ��������� ��� ������� ������� ��� �������� ���������� ������ ���� �������� �� �� �������� ���� ��� �� ������� ���� ��� ���� ����� ������ ���������

���� ��

������ ���� ��� �������� ��������

u �� G �� ���� �������

�� ��� ������� ���� �����

u(E) ⊂ Q+

�� ��� ����������� ����� ����� ����� ��������� �� �������� ���� ���� �������������� ����� ������ ��������� �� ���� ������ ��� ����� ��

|f |

��� �������� �� �

������

����������� ������

������ �� �� ������� �� � ������� ��� ����� ��� �������������� ��������� ��� ��� ���� �� ������� ���� ���� ���� ������� ����� ��� ��� ������������� ���� ��� ��� ����� ��� ���������

�

������ ��� �������������� ���������

��� �������� �� ��� �������������� ��������� ���� �� �������� �� ��� ��� �� ������� � ��� �� ���������� �� ����� �� ����� �� ���������� � ������ ��� �������� ��� ���������� ���� �� ����� ���������� � ���� ������ ���� ������� ��� ����������� �� �� �� �������� �� ��������� ����� �� ���������� ����� ��� ��� ���������� ���� �������� �� ��� ������� ��� �� ��� ������� ������ �� ������������� ���� ���� ��� ����� ����� �� ����������� �� ����� �� ������ �� ������� ���� �� ��� ���� � ������� ���� �� ������������� ������� �� ����� �������� ������ ��������� �� ��� ����� �� ������������ ����� �������� �� ����� ���� ���������� ��� ����� ��� ������ �� ���������� ���� ��� ������ �� ��� ���� �� ��� ������� �������� �� ��� ����� ����� ��� ���������� �� � ������� ��� ���������� ���� ���������� ����� �� ���� ���� �� ��� ���� ��� ���������� ��������� ��� ����� ������ ������������ ���������� �� ���������� ������ ����� �� �������� ���

size(I)

I

�� � ������ ������� �� ������������� ������� ��������� ������� �� �������

������ ��� ������ �� ���� ������ �� ��������� ��� ����� ���

number(I)

������ ���

number(I) size(I) ����������� �� ��� ������ �� ���� ������ �� ��������� ��� ���� ����������� ��� ��� �������� �� �� n × n ������ �� ������ ���������� number(I) ���� 2 2 �� n + n �n ��� ��� ������ ��� n ��� ��� ���������������� ����� size(I) �� ��� ��� �� ��� ������ ������ �� ������� �������� �� ��� ������ ��� �������� ��� � ������� ��� ���������

����������� �� ��� ������

m

�� ����� �����

����� �� ��� ��� ������� �� ��� ������ ��� ��� ���������������� �� ��� ���� �� ���������

A

������� �� �� ��������

I

�� ��������

•

��� ������ �� ���������� ��������� ��

•

��� ���� �� ��� ������ �������� ������ ��� ��������� ��

��� �� ��������� �� ��

A

���������� ��

�� �� ���� ���������� ��

A

size(I)

���

�� �� ���� ���������� ��

size(I)�

�������� ����������� �� ������� ����

•

��� ������ �� ���������� ��������� ��

•

��� ���� �� ��� ������ �������� ������ ��� ��������� ��

A

�� �� ���� ���������� ��

A

number(I)

���

�� �� ���� ���������� ��

size(I)�

����� ��� ��� ������� ����� ���� �� ������� ��� ������ �� ���������� ��������� ������� �� ��� ���� �� ��� ������� �� ��� ������ ��� �������� �������� ����������� ��� �� ����� �� �� �������� ���������� ��� ������� � ������ �� ��������� ��� �� ����� ���� ��� ������ �� ���������� �� �� ���� O(n3 )� ��� ��� ��� ���� ���� ���� ��� ���� �� ��� ������� �������� ������� ��� ��������� ��� ������������ ������� �� ��� ���� �� ��� ������� �� ��� ����� ����� �������� ��������� ��� ��������� ��� ��� �� ������� ��� �������� ���������� ��� ���� � ������� ��� ���������� ��� �� �����������

�������

�� ��� �������� ��� �������� ��� �������� ��� ���������� ���� �� ��� �������������� ���������� ���� ���� �������� �� ������� ��� ���� �� ���� ����� ���� ���� �� ���������� ����������� ����� ��� ������ ����� �� � �������� ���������� ����������

���� ��� ��������

������� ����� �� ��� ���� �� �������� ����������� �� ����� ��������� �� ��� �����

��������� ���������� �� ���� ��� ��������� ���������� ����� ���� ��� � ����

P

���� ����

�(P )

��

���������� ����� ���� �������

•

�� �������� ���������� ��������� �� ��� ����

���������� ���� ������ ���� ��� �������� ����� ��

�� �������� �� ��� ��� ���������� ���� ���� ���������

• ��

��� �� ����� ���� ��� ������ �� ���������� ��

�������� ��������� �������� � ������� ���� ��� ����

�(P )

��

O(m log U )�

O(m + n log n) �����

U

�����

�� � ����� ��� ���

������� ���������� ���� ��������� �� O((m+

n log n)m log U )� � ������� �������� ����� ��� �������� ���������� �� ����� �������� ��� ���������� ����������� ��� ���� ���������� ����� ����� ���� ��� ������ ��� ���� �������� ���� ��� ���� ���� ������� ��� ��������� �����������

���� ��� ���������

������� ����� �� ��� ���� �� �������� ����������� �� ����� ��������� �� ��� �����

��������� ���������� �� ���� ��� ���������� ���������� ����� ���� ��� � ����

P

���� ���� ��� ������

�� ����� �� ���������� ����� ���� ������� �� ������� �����

•

����� ������������ ������� �� �� �������� �� ��� ��� ���������� ���� ���� � ������� ������ �� ����� ��

• ��

O(m)

���� ��� ���������������������

��� �� ����� ���� ��� ������ �� ���������� �� O(nm2 )� ���� ����

��������� ��

O(nm)�

�������� � ������� ���� ��� ���

�������� ���������� ���� ��������� �� �������� ����������

��� ��������� ����� ���� �� ���������� ��� �����������

���� ���� �� ���� ������� ���� ������� ��� ���������

���������� ������ ������ �� ������� ������� �� ���������� ������ �� ������� ��� ������� ������

�� ����������� ������ �������� ��� ����� ��� ������������ �����

����� �� ������ �� ���������� ��� �� �� ��������

������������

���� �� ������� ��� ������� � ��������

���

������������ �� ����������� ������ ����� �������� �����

����������� ������������ �� ����������� �������� ��� ������� ��� ��������� ������� �� ��� ��� �� ���� �������� �����

���� ���� �������� ��� ������� �� �����

���� �� �� ����� �� �� ����������

������� ��� ������� � �������� �������� ������� �� �����������

�� �������� �����

�������

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

September 10, 2008

Lecture 3 Lecturer: Michel X. Goemans

1

Introduction

Today we continue our discussion of maximum ﬂows by introducing the fattest path augmenting algorithm, an improvement over the Ford-Fulkerson algorithm, to solve the max ﬂow problem. We also discuss the minimum cost circulation problem.

2

Maximum Flow

In a maximum ﬂow problem, the goal is to ﬁnd the greatest rate (ﬂow) at which material can be sent from a source s to a sink t. Several problems can be modeled as a max-ﬂow problem, including bipartite matching, which will be discussed today. We will also discuss ﬂow decomposition and the fattest augmenting path algorithm.

2.1

Maximum Cardinality Matching in Bipartite Graphs

A bipartite graph is a graph G = (V, E) whose vertex set V can be partitioned into two disjoint sets, A and B, such that every edge connects a vertex in A to one in B. A matching M is a subset of E such that the endpoints of all the edges in M are distinct. In other words, two edges in M cannot share a vertex. We are interested in solving the following problem: Given an undirected bipartite graph G = (V, E) where V = A ∪ B, ﬁnd a matching M of maximum cardinality. We can formulate this maximum cardinality matching problem as a max-ﬂow problem. To do that, consider the network shown in Figure 1.

Figure 1: The ﬁgure on the left represents a matching in a bipartite graph. The ﬁgure on the right shows how the bipartite graph can be converted into a max-ﬂow network by imposing a capacity of 1 on arcs out of s and into t.

3-1

The network is constructed as follows: We orient each edge in G from A to B and assign them a capacity of 1 (any capacity greater than 1 works too). We also add two new vertices, s and t, and arcs from s to every vertex in A, and from every vertex in B to t. All the new arcs are given unit capacity. Theorem 1 Let G = (V, E) be a bipartite graph with vertex partition V = A ∪ B, and let G� = (V � , E � ) be the capacitated network constructed as above. If M is a matching in G, then there is an integer-valued ﬂow f in G� with value |f | = |M |. Conversely, if f is an integer-valued ﬂow in G� , then there is a matching M in G with cardinality |M | = |f |. Proof: Given M , deﬁne a ﬂow f in G� as follows: if (u, v) ∈ M , then set f (s, u) = f (u, v) = f (v, t) = 1 and f (u, s) = f (v, u) = f (t, v) = −1. For all other edges (u, v) ∈ E � , let f (u, v) = 0. Each edge (u, v) ∈ M corresponds to 1 unit of ﬂow in G� that traverses the path s → u → v → t. The paths in M have distinct vertices, aside from s and t. The net ﬂow across the cut (A ∪ s : B ∪ t) is equal to |M |. We know that the net ﬂow across any cut is the same, and equals the value of the ﬂow. Thus, we can conclude that |M | = |f |. To prove the converse, let f be an integer-valued ﬂow in G� . By ﬂow conservation and the choice of capacities, the net ﬂow in each arc must be -1, 0 or 1. Let M be the set of edges (u, v), with u ∈ A, v ∈ B for which f (u, v) = 1. It is easy to see, by ﬂow conservation again, that M is indeed a matching and, using the same argument as before, that |M | = |f |. � Since all the capacities of this maximum ﬂow problem are integer valued, we know that there always exists an integer-valued maximum ﬂow, and therefore the theorem shows that this maximum ﬂow formulation correctly models the maximum cardinality bipartite matching.

2.2

Flow Decomposition

In an (raw) s-t ﬂow, we have the following building blocks: • Unit ﬂow on an s-t directed path. • Unit ﬂow on a directed cycle. Any (raw) s-t ﬂow can be written as a linear combination of these building blocks. Theorem 2 Any (raw) s-t ﬂow r can be decomposed into at most m ﬂows along either paths from s to t or cycles, where m is the number of edges in the network. More precisely, it can be decomposed into at most |{e : r(e) > 0}| ≤ m paths and cycles. Proof: By tracing back the ﬂow on an edge e and tracing forward the ﬂow on e, we either get an s-t path T , or a cycle T with r(e) > 0 for all e ∈ T . Denote the min ﬂow on T by Δ(T ): Δ(T ) = min r(e). e∈T

We want to decrease the ﬂow on T such that at least one edge goes to 0 (by subtracting out Δ(T )), and keep doing that until there are no more edges with non-zero ﬂows. More precisely, the following algorithm extracts at most m paths and cycles. (i) While there is a directed cycle C with positive ﬂow: (a) Decrease the ﬂow on this cycle by Δ(C) (b) Add this cycle as an element of the ﬂow decomposition (ii) (The set of arcs with positive ﬂow now form an acyclic graph.) While there is a path P from s to t with positive ﬂow: 3-2

(a) Decrease the ﬂow on this path by Δ(P ). (b) Add this path as an element of the ﬂow decomposition. Each time we decrease the ﬂow on a path or a cycle T , we zero out the ﬂow on some edge. When we do this, the new raw ﬂow is rnew (e) = r(e) − Δ(T ) if e ∈ T , or r(e) otherwise. Since there are |{e : r(e) > 0}| ≤ m edges with positive ﬂow in the graph, there will be at most that number of decreases in the ﬂow, and consequently, at most that number of paths or cycles in the ﬂow decomposition. �

2.3

Fattest Augmenting Path Algorithm (Edmonds-Karp ’72)

Flow decomposition is a key tool in the analysis of network ﬂow algorithms, as we will illustrate now. As we saw in the last lecture, the Ford-Fulkerson algorithm for ﬁnding a maximum ﬂow in a network may take exponential time, or even not terminate at all, if the augmenting path is not chosen appropriately. We proposed two speciﬁc choices of augmenting paths, both due to Edmonds and Karp, that provide a polynomial running time. One was the shortest augmenting path, the other was the fattest augmenting path or maximum-capacity augmenting path: the augmenting path that increases the ﬂow the most. This is the variant we analyze now. For an augmenting s-t path P ∈ Gf , deﬁne ε(P ) = min uf (v, w) (v,w)∈P

where the uf are the residual capacities. The minimum residual capacity ε(P ) (the bottleneck) is the maximum ﬂow that can be pushed along the path P . We wish to ﬁnd the fattest augmenting path P such that ε(P ) is maximized. The fattest augmenting path P can be eﬃciently found with Dijkstra’s algorithm in O(m + n log n) time 1 . Theorem 3 Assuming that capacities are integral and bounded by U , the optimal ﬂow for a network can be found in O(m log(mU )) = O(m log(nU )) iterations of augmenting along the fattest path. Proof: Start with a zero ﬂow, f = 0. Consider a maximum ﬂow f ∗ . Its value is at most the value of any cut, which is bounded by mU : |f ∗ | ≤ mU. Consider the ﬂow f ∗ − f (this is, f ∗ (e) − f (e) for all edges e) in the residual graph Gf with residual capacities uf = u − f . We can decompose f ∗ − f into ≤ m ﬂows using ﬂow decomposition. As a result, at least one of 1 (|f ∗ | − |f |). Suppose now that we push ε(P ) units of these paths carry a ﬂow of value at least m ﬂow along the fattest path in the residual graph Gf and obtain a new ﬂow f new of value: |f new | = |f | + ε(P ). Since the fattest path provides the greatest increase in ﬂow value, we must have that ε(P ) ≥ 1 ∗ m (|f | − |f |). Thus we have the following inequality |f new | ≥ |f | +

1 (|f ∗ | − |f |), m

1 Actually,

it can be found in O(m) time under the condition that we have the capacities sorted beforehand, see the forthcoming problem set.

3-3

which implies |f ∗ | − |f new | = |f ∗ | − |f | + |f | − |f new | � � 1 (|f ∗ | − |f |) . 1− ≤ m After k iterations, we get a ﬂow fˆ such that |f | − |fˆ| ≤ ∗

�

1 1− m

�k mU.

Eventually |f ∗ | − |fˆ| < 1 which implies f ∗ = fˆ since, for integral capacities, all intermediate ﬂows 1 m will be integral. Since (1 − m ) ≤ 1e for all m ≥ 2, the number of iterations required for the diﬀerence to go below 1 is k = m log(mU ). � Combining the results mentioned above we have the following corollary. Corollary 4 We can ﬁnd a maximum ﬂow in an integer-capacitated network with maximum capacity U in O((m + n log n)m log(nU )) time 2 .

3

Minimum Cost Circulation Problem (MCCP)

A circulation is simply a ﬂow where the net ﬂow into every vertex (there are no sources or sinks) is zero. Notice that we can easily transform an s − t ﬂow to a circulation by adding one arc from t to s (with inﬁnite capacity) which carries a ﬂow equal to the s − t ﬂow value. Deﬁnition 1 A circulation f satisﬁes (i) Skew-Symmetry: ∀ (v, w) ∈ E, f (v, w) = −f (w, v). � (ii) Flow Conservation: ∀ v ∈ V , w f (v, w) = 0. (iii) Capacity Constraints: ∀ (v, w) ∈ E, f (v, w) ≤ u(v, w). Deﬁnition 2 A cost function c : E �→ R assigns a cost per unit ﬂow to each edge. We assume the cost function satisﬁes skew symmetry: c(v, w) = −c(w, v). For a set of edges C (e.g. a cycle), we denote the total cost of C by : � c(C) = c(v, w). (v,w)∈C

Deﬁnition 3 The goal of the Minimum Cost Circulation Problem (MCCP) is to ﬁnd a circulation f of minimum cost c(f ) where � c(f ) = c(v, w)f (v, w). (v,w)

The MCCP is a special case of a Linear Programming (LP) problem (an optimization problem with linear constraints and a linear objective function). But while no strongly polynomial time algorithms are known for linear programming, we will be able to ﬁnd one for MCCP. 2 Using

the previous footnote, we can do this in O(m2 log(nU )) time.

3-4

3.1

Vertex Potentials

Before we can solve MCCP, it is necessary to introduce the concept of vertex potentials, or simply potentials. Deﬁnition 4 A vertex potential is a function p : V �→ R that assigns each vertex a potential. The vertex potential deﬁnes a reduced cost function cp such that cp (v, w) = c(v, w) + p(v) − p(w). Proposition 5 The function cp satisﬁes the following properties: (i) Skew-Symmetry: cp (v, w) = −cp (w, v). (ii) Cycle Equivalence: for a cycle C, c(C) = cp (C); i.e., the reduced cost function agrees with the cost function. (iii) Circulation Equivalence: for all circulations, the reduced cost function agrees with the cost function, c(f ) = cp (f ). Proof: The ﬁrst property is trivial. The second property follows since all the potential terms cancel out. And we’ll prove the third property. By deﬁnition � cp (f ) = (c(v, w) + p(v) − p(w))(f (v, w)) (v,w)

=

c(f ) +

�

p(v)

v

�

f (v, w) −

w:(v,w)∈E

�

p(w)

w

�

f (v, w).

v:(w,v)∈E

Now by ﬂow conservation, the inner sums are all zero. Hence cp (f ) = c(f ). (The third property also follows easily from ﬂow decomposition, as the decomposition of a circulation only contains cycles and thus the cost and the reduced cost of a circulation are the same because of (ii).) �

3.2

Klein’s Cycle-Cancelling Algorithm

We present a pseudo-algorithm for removing negative-cost cycles. While there exists a negative-cost cycle C in Gf , push a ﬂow ε along the cycle C, where ε is the minimum residual ﬂow: ε = min uf (v, w). (v,w)∈C

Of course, this doesn’t lead to a straight-forward implementation, since we haven’t speciﬁed which negative-cost cycle to select or how to ﬁnd them. We should also consider whether the algorithm is eﬃcient and whether it will terminate. We’ll answer these questions in the next lecture. However, we will show now that if it terminates, then the circulation output is of minimum cost.

3.3

Optimality Conditions

We now present a theorem that speciﬁes the conditions required for f to be a minimum cost circu lation. Theorem 6 (Optimality Condition) Let f be a circulation. The following are equivalent: (i) f is of minimum cost. (ii) There exists no negative-cost cycle in the residual graph Gf . 3-5

(iii) There exists a potential function p such that for all (v, w) ∈ Ef , cp (v, w) ≥ 0. Proof: To show that (i) implies (ii), we’ll prove the contrapositive. Suppose there exists a negative cost cycle C in the residual graph Gf where f is the optimal circulation. Denote by C � the reverse cycle (i.e. following the arcs in the reverse order). We deﬁne a new circulation f � for any edge e as follows. If e ∈ C, f � (e) = f (e) + ε. And if e ∈ C � , then f � (e) = f (e) − ε. Otherwise, let f � (e) = f (e). Then we compute the cost of this new ﬂow as c(f � )

=

c(f ) + (ε)(c(C)) + (−ε)(−c(C))

=

0

≥ 0 by (iii). Note that in the second to last step, we utilized the skew-symmetry of the cost of reverse arcs (with ﬂows of opposite parity). But since f ∗ is supposed to be strictly better than f , we have a contradiction. �

References [EK72] Jack Edmonds, and Richard M. Karp, Theoretical improvements in algorithmic eﬃciency for network ﬂow problems, Journal of the ACM 19 (2): 248–264, 1972. [Klein67] Klein, M. A primal method for minimum cost ﬂows with application to the assignment and transportation problem. Management Science 14: 205-220, 1967.

3-6

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

September 15, 2008

Goldberg-Tarjan Min-Cost Circulation Algorithm Lecturer: Michel X. Goemans

1

Introduction

In this lecture we shall study Klein’s cycle cancelling algorithm for ﬁnding the circulation of minimum cost in greater detail. We will pay particular attention to the choice of cycle to cancel and we will rigorously prove two bounds on the number of iterations required, the ﬁrst of which depends on the magnitude of the cost and is valid only for integer-valued costs, and the second of which is strongly polynomial and works even for irrational costs. Recall from last time that for a given circulation f , the following are equivalent: i. f is of minimum cost ii. There is no negative cost cycle in the residual graph Gf iii. There exist potentials p : V � R such that the reduced costs cp (v, w) = c(v, w) + p(v) − p(w) � 0 for all (v, w) ≥ Ef , where Ef = {e : uf (e) > 0}.

2

Klein’s cycle cancelling algorithm

Algorithm 1 Kleins-Cycle-Cancel(Gf ) Let f be any circulation (e.g., f = 0) while there exists a negative cost cycle � ≥ Gf do Push �(f ) = min uf (v, w) along � (v,w)��

end while It is important to note that the Ford-Fulkerson algorithm for the maximum ﬂow problem is a special case of Klein’s cycle cancelling algorithm, by deﬁning zero costs for all edges in the original graph and by adding an extra edge from the sink to the source with cost −1.

2.1

Choice of cycle �

As in the Ford-Fulkerson algorithm, the question is which negative-cost cycle to choose. 1. (Weintraub 1972). One idea is to try choosing the maximum improvement cycle, where the diﬀerence in cost is as large as possible. One can show that the number of iterations is polynomial for rational costs, but ﬁnding such a cycle is NP-hard. For irrational costs, one can show that this algorithm may never terminate (Queyranne 1980) even for the maximum ﬂow problem (the fattest augmenting path algorithm of Edmonds and Karp), although the solution converges to a minimum cost ﬂow.

lect-1

2. (Goldberg-Tarjan 1986). Alternatively, we can choose the cycle of minimum mean cost, deﬁned as follows: c(�) µ(f ) = min directed cycles � ≥ Gf |�| � where c(�) = (v,w)�� c(v, w) and |�| is the number of edges in the cycle. Notice that there exists a negative cost cycle in Gf if and only if µ(f ) is negative.

To see that we can indeed ﬁnd the minimum mean-cost cycle eﬃciently, suppose we replace the

costs c with c� such that c� (v, w) = c(v, w) + � for each edge (v, w). Then µ� (f ) = µ(f ) + �,

so if � = −µ(f ) then we would have µ� (f ) = 0. In particular,

µ(f ) = − inf{� : there is no negative cost cycle in Gf with respect to costs c + �}. For any �, we can decide if there is a negative cost cycle by using the Bellman-Ford algorithm. Now, perform binary search to ﬁnd the smallest � for which no such cycle exists. In the next problem set we will show a result by Karp, which ﬁnds the cycle of minimum mean cost in O(nm) time by using a variant of Bellman-Ford.

2.2

Bounding the number of iterations

We will give two bounds on the number of iterations for the algorithm. The ﬁrst depends on the magnitude of the cost and is valid only for integer-valued costs; it is polynomial but not strongly polynomial. The second bound is strongly polynomial and works even for irrational costs. We ﬁrst need a measure of ‘closeness’ to the optimal circulation. The following deﬁnition gives such a measure, and will be key in quantifying the progress of the algorithm. Deﬁnition 1 (Relaxed optimality) A circulation f is said to be �-optimal if there exists a po tential p : V � R such that cp (v, w) � −� for all edges (v, w) ≥ Ef . Note that an 0-optimal circulation is of minimum cost. Deﬁnition 2 For a circulation f , let �(f ) = min{� : f is �-optimal}. One important thing about this that we will prove soon is that when we push some ﬂow in a circulation f along some cycle � and obtain a new circulation f � , we get that �(f � ) � �(f ). This means that � is monotonically non-increasing in general. First, we need the following strong relationship between �(f ) and µ(f ), and this really justiﬁes the choice of cycle of Goldberg and Tarjan. Theorem 1 For all circulations f , �(f ) = −µ(f ). Proof: We ﬁrst show that µ(f ) � −�(f ). From the deﬁnition of �(f ) there exists a potential p : V � R such that cp (v, w) � −�(f ) for all (v, w) ≥ Ef . For any cycle � ≤ Ef the cost c(�) is equal to the reduced cost cp (�) since the potentials cancel. Therefore c(�) = cp (�) � −|�|�(f ) and so c(�) |�| � −�(f ) for all cycles �. Hence µ(f ) � −�(f ). Next, we show that µ(f ) � −�(f ). For this, we start with the deﬁnition of µ(f ). For every � cycle � ≥ Ef it holds that c(�) |�| � µ(f ). Let c (v, w) = c(v, w) − µ(f ) for all (v, w) ≥ Ef . Then, c� (�) |�|

= c(�) |�| − µ(f ) � 0 for any cycle �. Now deﬁne p(v) as the cost of the shortest path from an added source s to v with respect to c� in Gf (see Fig. 1); the reason we add a vertex s is to make sure that every vertex can be reached (by the direct path). Note that the shortest paths are well-deﬁned since there are no negative cost cycles with respect to c� . By the optimality property of shortest lect-2

s

c’(v,w)

v

0

w

0 0

Figure 1: p(v) is the length of the shortest path from s to v. paths, p(w) � p(v) + c� (v, w) = p(v) + c(v, w) − µ(f ). Therefore cp (v, w) � µ(f ) for all (v, w) ≥ Ef which implies that f is −µ(f )-optimal and thus �(f ) � −µ(f ). By combining µ(f ) � −�(f ) and �(f ) � −µ(f ) we conclude �(f ) = −µ(f ) as required. � The nature of the algorithm is to push ﬂow along negative cost cycles. We would like to know if this actually gets us closer to optimality. This is shown in the following remark. Remark 1 (Progress) Let f be a circulation. If we push ﬂow along the minimum mean cost cycle � in Gf and obtain circulation f � then �(f ) � �(f � ). c (�)

Proof: By deﬁnition p|�| = c(�) |�| = µ(f ). Now, �(f ) = −µ(f ) implies that there exists a potential p such that cp (v, w) � µ(f ) for all (v, w) ≥ Ef . Furthermore for all (v, w) ≥ � the reduced cost cp (v, w) = µ(f ) = −�(f ). If ﬂow is pushed along � some arcs may be saturated and disappear from the residual graph. On the other hand, new edges may be created with a reduced cost of +�(f ). More formally, Ef � ≤ Ef → {(w, v) : (v, w) ≥ �}. So for all (v, w) ≥ Ef � it holds that cp (v, w) � −�(f ). Thus we have that �(f � ) � �(f ). �

2.3

Analysis for Integer-valued Costs

We now prove a polynomial bound on the number of iterations for an integer cost function c : E � Z. At the start, for any circulation, the following holds for all (v, w) ≥ E: �(f ) � C = max |c(v, w)|. (v,w)�E

Now we can continue with the rest of the analysis. Lemma 2 If costs are integer valued and �(f )

− n1 . For any cycle � ≥ Gf we have c(�) = cp (�) > − n1 |�| � −1. Since the cost is an integer, c(�) � 0. By the optimality condition, if there is no negative cycle in the graph, the circulation is optimal. � Lemma 3 Let f be a circulation and let f � be the circulation after m iterations of the algorithm. Then �(f � ) � (1 − n1 )�(f ). Proof: Let p be the potential such that cp (v, w) � −�(f ) for all (v, w) ≥ Ef and let �i and fi be the cycle that is cancelled and the circulation obtained at the ith iteration, respectively. Let A be the set of edges in Efi such that cp (v, w) < 0 (we should emphasize that this is for the p corresponding to the circulation f we started from). We now show that as long as � i ≤ A, then |A| strictly decreases. This is because cancelling a cycle removes at least one arc with a negative reduced cost from A and any new arc added to Efi must have a positive reduced cost. Hence after

lect-3

k � m iterations we will ﬁnd an edge (v, w) ≥ �k+1 such that cp (v, w) � 0. So by Theorem 1, −�(fk ) is equal to the mean cost of �k+1 and thus �(fk ) = −µ(fk ) = −

c(�k+1 ) |�k+1 |

= � �

cp (�k+1 ) |�k+1 | 0 + (−�(f ))(|�k+1 | − 1) − |�k+1 | � � 1 1− �(f ). n

−

� Corollary 4 If the costs are integer, then the number of iterations is at most mn log(nC). Proof:

We have that �(fend ) �

�

1−

1 n

�n log(nC)

�(f = 0) < e− log(nC) |C| =

1 1 |C| = , nC n

and thus the resulting circulation is optimal. � The time per iteration will be shown to be O(nm) (see problem set), hence the total running time of the algorithm is O(m2 n2 log(nC)).

2.4

Strongly Polynomial Analysis

In this section we will remove the dependence on the costs. We will obtain a strongly polynomial bound for the algorithm for solving the minimum cost circulation problem. In fact we will show that this bound will hold even for irrational capacities. The ﬁrst strongly polynomial-time analysis is due to Tardos; the one here is due to Goldberg-Tarjan. This result was very signiﬁcant, since it was the most general subclass of Linear Programming (LP) for which a strongly polynomial-time algorithm was shown to exist. It remains a big open problem whether a strongly polynomial-time algorithm exists for general LP. Deﬁnition 3 An edge e is �-ﬁxed if for all �-optimal circulations f we have that f (e) maintains the same value. Note that (v, w) is �-ﬁxed if and only if (w, v) is �-ﬁxed, by skew-symmetry of edge-costs. Theorem 5 Let f be a circulation and p be a potential such that f is �(f )-optimal with respect to p. Then if |cp (v, w)| � 2n� for some edge (v, w) ≥ E, the edge (v, w) is �-ﬁxed. Proof: Suppose (v, w) is not �(f )-ﬁxed. There exists an f � that is �(f )-optimal and f � (v, w) = ∪ f (v, w); without loss of generality assume f � (v, w) < f (v, w). Let E< = {(x, y) : f � (x, y) < f (x, y)}. We can see that E< ≤ Ef � by deﬁnition of Ef � . Furthermore, from ﬂow conservation, we know that there exists a cycle � ≥ Ef � containing the edge (v, w). Indeed, by ﬂow decomposition, we know that the circulation f − f � can be decomposed into (positive net) ﬂows along cycles of Ef � , and thus one of these cycles must contain (v, w) Now we have the following, c(�) = cp (�) � −2n�(f ) + (n − 1)�(f ) < −n�(f ). Consequently, contradiction.

c(�) |�|

< −� and so µ(f � ) < −�. As a result, f � is not �(f )-optimal and thus we have a � lect-4

Lemma 6 After O(mn log n) iterations, another edge becomes ﬁxed. Proof: Let f be a circulation and f � be another circulation after application of mn log(2n) iterations of the Goldberg-Tarjan algorithm. Also suppose that � is the ﬁrst cycle cancelled and p, p � are the potentials for f, f � respectively. From the previous lemma, we have that �(f � ) � (1− n1 )n log(2n) �(f ) < e− log(2n) = 21n �(f ). Now from the deﬁnition of µ we get the following, cp� (�) c(�) = = µ(f ) = −�(f ) < −2n�(f � ) |�| |�| This means that there exists an edge (v, w) ≥ � such that cp� (v, w) < −2n�(f � ) which means that it was not �(f )-ﬁxed. Thus (v, w) becomes �(f � )-ﬁxed and the claim is proven. � Notice that if e is ﬁxed, it will remain ﬁxed as we iterate the algorithm. An immediate con sequence of the above lemma then is a bound on the number of iterations in the Goldberg-Tarjan algorithm. Corollary 7 The number of iterations of the Goldberg-Tarjan algorithm, even with irrational costs, is O(m2 n log n).

lect-5

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

September 17, 2008

Lecture 5 Lecturer: Michel X. Goemans Today, we continue the discussion of the minimum cost circulation problem. We ﬁrst review the Goldberg-Tarjan algorithm, and improve it by allowing more ﬂexibility in the selection of cycles. This gives the Cancel-and-Tighten algorithm. We also introduce splay trees, a data structure which we will use to create another data structure, dynamic trees, that will further improve the running time of the algorithm.

1

Review of the Goldberg-Tarjan Algorithm

Recall the algorithm of Golberg and Tarjan for solving the minimum cost circulation problem: 1. Initialize the ﬂow with f = 0. 2. Repeatedly push ﬂow along the minimum mean cost cycle Γ in the residual graph Gf , until no negative cycles exist. We used the notation

c(Γ) |Γ| to denote the minimum mean cost of a cycle in the residual graph Gf . In each iteration of the algorithm, we push as much ﬂow as possible along the minimum mean cost cycle, until µ(f ) ≥ 0. We used �(f ) to denote the minimum � such that f is �-optimal. In other words µ(f ) =

min

cycle Γ⊆Ef

�(f ) = min{� : ∃ potential p : V → R such that cp (v, w) ≥ −� for all edges (v, w) ∈ Ef }. We proved that for all circulations f , �(f ) = −µ(f ). A consequence of this equality is that there exists a potential p such that any minimum mean cost cycle Γ satisﬁes cp (v, w) = −�(f ) = µ(f ) for all (v, w) ∈ Γ, since the cost of each edge is bounded below by mean cost of the cycle.

1.1

Analysis of Goldberg-Tarjan

Let us recall the analysis of the above algorithm. This will help us to improve the algorithm in order to achieve a better running time. Please refer to the previous lecture for the details of the analysis. We used �(f ) as an indication of how close we are to the optimal solution. We showed that �(f ) is a non-increasing quantity, that is, if f � is obtained by f after a single iteration, then �(f � ) ≤ �(f ). It remains to show that �(f ) decreases “signiﬁcantly” after several iterations. Lemma 1 Let f be any circulation, and f � be the circulation obtained after m iterations of the Goldberg-Tarjan algorithm. Then � � 1 �(f � ) ≤ 1 − �(f ). n We showed that if the costs are all integer valued, then we are done as soon as we reach �(f ) < n1 . Using these two facts, we showed that the number of iterations of the above algorithm is at most O(mn log(nC)). An alternative analysis using �-ﬁxed edges provides a strongly polynomial bound of O(m2 n log n) iterations. Finally, the running time per a single iteration is O(mn) using a variant of Bellman-Ford (see problem set). 5-1

1.2

Towards a faster algorithm

In the above algorithm, a signiﬁcant amount of time is used to compute the minimum cost cycle. This is unnecessary, as our goal is simply to cancel enough edges in order to achieve a “signiﬁcant” improvement in � once every several iterations. We can improve the algorithm by using a more ﬂexible selection of cycles to cancel. The idea of the Cancel-and-Tighten algorithm is to push ﬂows along cycles consisting entirely of negative cost edges. For a given potential p, we push as much ﬂow as possible along cycles of this form, until no more such cycles exist, at which point we update p and repeat.

2

Cancel-and-Tighten

2.1

Description of the Algorithm

Deﬁnition 1 An edge is admissible with respect to a potential p if cp (v, w) < 0. A cycle Γ is admissible if all the edges of Γ are admissible. Cancel and Tighten Algorithm (Goldberg and Tarjan): 1. Initialization: f ← 0, p ← 0, � ← max(v,w)∈E c(v, w), so that f is �-optimal respect to p. 2. While f is not optimum, i.e., Gf contains a negative cost cycle, do: (a) Cancel: While Gf contains a cycle Γ which is admissible with respect to p, push as much ﬂow as possible along Γ. � � � (b) Tighten: Update p to p� and � to ��� , where � p and � are chosen such that cp� (v, w) ≥ −� 1 � for all edges (v, w) ∈ Ef and � ≤ 1 − n �.

Remark 1 We do not update the potential p every time we push a ﬂow. The potential p gets updated in the tighten step after possibly several ﬂows are pushed through in the Cancel step. Remark 2 In the tighten step, we do not need to ﬁnd p� and �� such that �� is as small as possible; it is only necessary to decrease � by a factor of at least 1 − n1 . However, in practice, one tries to decrease � by a smaller factor in order to obtain a better running time. Why is it always possible to obtain improvement factor of 1 − n1 in each iteration? This is guaranteed by the following result, whose proof is similar to the proof used in the analysis during the previous lecture. Lemma 2 Let f be a circulation and f � be the circulation obtained by performing the Cancel step. Then we cancel at most m cycles, and � � 1 �(f � ) ≤ 1 − �(f ). n Proof:

Since we only cancel admissible edges, after any cycle is canceled in the Cancel step:

• All new edges in the residual graph are non-admissible, since the edge costs are skew-symmetric; • At least one admissible edge is removed from the residual graph, since we push the maximum possible amount of ﬂow through the cycle.

5-2

Since we begin with at most m admissible edges, we cannot cancel more than m cycles, as each cycle canceling reduces the number of admissible edges by at least one. After the cancel step, every cycle Γ contains at least one non-admissible edge, say (u1 , v1 ) ∈ Γ with cp (u1 , v1 ) ≥ 0. Then the mean cost of Γ is � � � � � c(Γ) 1 −(|Γ| − 1) 1 1 ≥ cp (u, v) ≥ �(f ) = − 1 − �(f ) ≥ − 1 − �(f ). |Γ| |Γ| |Γ| |Γ| n (u1 ,v1 )=(u,v)∈Γ �

�

� � Therefore, �(f � ) = −µ(f � ) ≤ 1 − n1 �(f ).

2.2

Implementation and Analysis of Running Time

2.2.1

Tighten Step

We ﬁrst discuss the Tighten step of the Cancel-and-Tighten algorithm. In this step, we wish to ﬁnd � � � a new potential � �function p and a constant � such that� cp� (v, w) ≥ −� for all edges (v, w) ∈ Ef 1 � and � ≤ 1 − n �. We can ﬁnd the smallest possible � in O(mn) time by using a variant of the Bellman-Ford algorithm. However, since we do not actually need to ﬁnd the best possible �� , it is possible to vastly reduce the running time of the Tighten step to O(n), as follows. When the Cancel step terminates, there are no cycles in the admissible graph Ga = (V, A), the subgraph of the residual graph with only the admissible edges. This implies that there exists a topological sort of the admissible graph. Recall that a topological sort of a directed acyclic graph is a linear ordering l : V → {1, . . . , n} of its vertices such that l(v) < l(w) if (v, w) is an edge of the graph; it can be achieved in O(m) time using a standard topological sort algorithm (see, e.g., CLRS page 550). This linear ordering enables us to deﬁne a new potential function p� by the equation p� (v) = p(v) − l(v)�/n. We claim that this potential function satisﬁes our desired properties. Claim 3 The new potential function p� (v) = p(v) − l(v)�/n satisﬁes the property that f is �� -optimal with respect to p� for some constant �� ≤ (1 − 1/n)�. Proof:

Let (v, w) ∈ Ef , then cp� (v, w) = c(v, w) + p� (v) − p� (w) = c(v, w) + p(v) − l(v)�/n − p(w) + l(w)�/n = cp (v, w) + (l(w) − l(v))�/n.

We consider two cases, depending on whether or not l(v) < l(w). Case 1: l(v) < l(w). Then cp� (v, w) = cp (v, w) + (l(w) − l(v))�/n ≥ −� + �/n = −(1 − 1/n)�. Case 2: l(v) > l(w), so that (v, w) is not an admissible edge. Then cp� (v, w) = cp (v, w) + (l(w) − l(v))�/n ≥ 0 − (n − 1)�/n = −(1 − 1/n)�. In either case, we see that f is �� -optimal with respect to p� , where �� ≤ (1 − 1/n)�.

5-3

�

2.2.2

Cancel Step

We now shift our attention to the implementation and analysis of the Cancel step. Na¨ıvely, it takes O(m) time to ﬁnd a cycle in the admissible graph Ga = (V, A) (e.g., using Depth-First Search) and push ﬂow along it. Using a more careful implementation of the Cancel step, we shall show that each cycle in the admissible graph can be found in an “amortized” time of O(n). We use a Depth-First Search (DFS) approach, pushing as much ﬂow as possible along an ad missible cycle and removing saturated edges, as well as removing edges from the admissible graph whenever we determine that they are not part of any cycle. Our algorithm is as follows: Cancel(Ga = (V, A)):

Choose an arbitrary vertex u ∈ V , and begin a DFS rooted at u.

1. If we reach a vertex v that has no outgoing edges, then we backtrack, deleting from A the edges that we backtrack along, until we ﬁnd an ancestor r of v for which there is another child to explore. (Notice that every edge we backtrack along cannot be part of any cycle.) Continue the DFS by exploring paths outgoing from r. 2. If we ﬁnd a cycle Γ, then we push the maximum possible ﬂow through it. This causes at least one edge along Γ to be saturated. We remove the saturated edges from A, and start the depth-ﬁrst-search from scratch using G�a = (V, A� ), where A� denotes A with the saturated edges removed. Every edge that is not part of any cycle is visited at most twice (since it is removed from the admissible graph the second time), so the time taken to remove edges that are not part of any cycle is O(m). Since there are n vertices in the graph, it takes O(n) time to ﬁnd a cycle (excluding the time taken to traverse edges that are not part of any cycle), determine the maximum ﬂow that we can push through it, and update the ﬂow in each of its edges. Since at least one edge of A is saturated and removed every time we ﬁnd a cycle, it follows that we ﬁnd at most m cycles. Hence, the total running time of the Cancel step is O(m + mn) = O(mn). 2.2.3

Overall Running Time

From the above analysis, we see that the Cancel step requires O(mn) time per iteration, whereas the Tighten step only requires O(m) time per iteration. In the previous lecture, we determined that the Cancel-and-Tighten algorithm requires O(min(n log(nC), mn log n)) iterations. Hence the overall running time is O(min(mn2 log(nC), m2 n2 log n)). Over the course of the next few lectures, we will develop data structures that will enable us to reduce the running time of a single Cancel step from O(mn) to O(m log n). Using dynamic trees, we can reduce the running time of the Cancel step to an amortized time of O(log n) per cycle canceled. This will reduce the overall running time to O(min(mn log(nC) log n, m2 n log2 n)).

3

Binary Search Trees

In this section, we review some of the basic properties of binary search trees and the operations they support, before introducing splay trees. A Binary Search Tree (BST) is a data structure that maintains a dictionary. It stores a collection of objects with ordered keys. For an object (or node) x, we use key[x] to denote the key of x. Property of a BST.

The following invariant must always be satisﬁed in a BST:

• If y lies in the left subtree of x, then key[y] ≤ key[x] • If z lies in the right subtree of x, then key[z] ≥ key[x] 5-4

Operations on a BST.

Here are some operations typically supported by a BST:

• Find(k): Determines whether the BST contains an object x with key[x] = k; if so, returns the object, and if not, returns false. • Insert(x): Inserts a new node x into the tree. • Delete(x): Deletes x from the tree. • Min: Finds the node with the minimum key from the tree. • Max: Finds the node with the minimum key from the tree. • Successor(x): Find the node with the smallest key greater than key[x]. • Predecessor(x): Find the node with the greatest key less than key[x]. • Split(x): Returns two BSTs: one containing all the nodes y where key[y] < key[x], and the other containing all the nodes z where key[z] ≥ key[x]. • Join(T1 , x, T2 ): Given two BSTs T1 and T2 , where all the keys in T1 are at most key[x], and all the keys in T2 are at least key[x], returns a BST containing T1 , x and T2 . For example, the procedure Find(k) can be implemented by traversing through the tree, and branching to the left (resp. right) if the current node has key greater than (resp. less than) k. The running time for many of these operations is linear in the height of the tree, which can be as high as O(n) in the worst case, where n is the number of nodes in the tree. A balanced BST is a BST whose height is maintained at O(log n), so that the above operations can be run in O(log n) time. Examples of BSTs include Red-Black trees, AVL trees, and B-trees. In the next lecture, we will discuss a data structure called splay trees, which is a self-balancing BST with amortized cost of O(log n) per operation. The idea is that every time a node is accessed, it gets pushed up to the root of the tree. The basic operations of a splay tree are rotations. They are illustrated the following diagram. y

x zig (right rotation)

x

C A

zag (left rotation)

y

A B

B

5-5

C

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

September 24, 2008

Lecture 6 - Splay Trees Lecturer: Michel X. Goemans

1

Introduction

In this lecture, we investigate splay trees, a type of binary search tree (BST) ﬁrst formulated by Sleator and Tarjan in 1985. Splay trees are self-adjusting BSTs that have the additional helpful property that more commonly accessed nodes are more quickly retrieved. They have good behavior when compared to many other types of self-balancing BSTs, even when the operations are unknown and non-uniform. While in the worst case, operations can take O(n) time, splay trees maintain O(log n) amortized cost for basic BST operations, and are within a constant factor to the cost of any static BST. We ﬁrst give an overview of the operations used in splay trees, then give an amortized analysis of its behavior. We conclude by noting its behavior relative to other Binary Search Trees.

2

Splay Tree Structure

A splay tree is a dynamic binary search tree, meaning that it performs additional operations to optimize behavior. Because they are BSTs, given a node x in a splay tree and a node y in the left subtree of x, we have key(y) < key(x). Similarly, for a node z in the right subtree of x, we have key(x) < key(z). This is the binary search tree property. A well-balanced splay tree will have height Θ(log(n), where n is the number of nodes. Splay trees achieve their eﬃciency through use of the following operations:

2.1

Rotation

The basic operation used in splay trees (or any other dynamic BST) is the rotation. A rotation involves rearranging the nodes of a subtree rooted at y so that one of the children x of y becomes the new root of the subtree, while maintaining the binary search tree property. This is illustrated in Figure 1. When the left child becomes the new root, the rotation is a right rotation. When the right child becomes the new root, the rotation is a left rotation. We call a right rotation a zig and a left rotation a zag. The key idea of the splay tree is to bring node x to the root of the tree when accessing x via rotations. This brings the most recently accessed nodes closer to the top of the tree. However, there are many ways of bringing a node to the root via rotations, and we must therefore specify in which order we perform them. Consider a linear tree (eﬀectively a linked list) of the values 1, . . . , n, rooted at n. Suppose we access the value 1. If we use the naive (and most natural) method of repeatedly performing a zig to bring 1 at the top, we proceed as illustrated in Figure 2. The resulting tree has the same height as the original tree, and is clearly not better balanced. We must try a more clever approach than successive, single rotations. 6 - Splay Trees-1

Figure 1: Rotation via zigs and zags.

Figure 2: When we access node 1 and try to bring it up via pure rotations, the result is a tree that is just as unbalanced as before.

2.2

Splay-Step

We now deﬁne an operation called splay-step. In one splay-step on a node x, x is brought up 2 levels with rotations (or just 1 level if x’s parent is the root). When some node x is accessed in the splay tree, we bring x up with a series of splay-steps until it is the root. We separate the actions performed for the splay-step into the following categories. Call the node that we are trying to access x, its parent y, and y’s parent z. • Case 0: x is the root. Do nothing in this case. • Case 1: y is the root. If x is the left child of the root, perform a zig on x and y. If not, perform a zag. • Case 2: x and y are both left children (or both right children). Let us look at the case when both x and y are left children. We ﬁrst do a zig on the y-z connection. Then, we do a zig on the x-y connection. If x and y are right children, we do the same thing, but with zags instead. (See Figure 3.) • Case 3: x is a left child and y is a right child, or vice versa. Consider the case where x is a right child, and y is a left child. We ﬁrst do a zag on the x-y edge, and then a zig on the x-z edge. In the case where x is a left child and y a right child, we do the same thing, but with a zig on the ﬁrst move, followed by a zag. (See Figure 4.) 6 - Splay Trees-2

Figure 3: Case 2 of the splay-step is when x and y are the same type of children. In this ﬁgure, we ﬁrst do a zig on y − z, and then a zig on x − z.

Figure 4: In Case 3, x and y are not the same type of children. In this case, we do a zag on the x − y edge, and then a zig on the x − z edge. Note that in the case of the earlier example with the chain of nodes, using splay-step instead of direct rotations results in a much more balanced tree, see Figure 5.

2.3

Splay

With the splay-step operation, we can bring the node x to the root of the splay tree with the procedure: splay(x): WHILE x=root: � DO splay-step(x) The described procedure performs the splay operation in a bottom-up order. It is possible to perform the splay operation in a top down fashion, which would result in the same running time.

6 - Splay Trees-3

Figure 5: When splaying node 1, the resulting tree has half its original height.

3

Running-Time Analysis

3.1

Potential Function

We deﬁne a class of potential functions for the amortized analysis of operations on a splay tree. The potential function depends on weights that we can choose. For each node x in the tree, make the following deﬁnitions: • T (x) is the subtree rooted at x (and it includes teh node x itself), • weight function: w(x) > 0 is the weight of node x (we can choose what this is; we’ll often take w(x) = 1 for all nodes x) � • weight-sum function: s(x) = w(y), y∈T (x)

• rank function: r(x) = log2 s(x).

6 - Splay Trees-4

Then we deﬁne the potential function as: φ=

�

r(x).

x∈T (root)

3.2

Amortized Cost of Splay(x)

Using the potential function described above, we can show that the amortized cost of the splay operation is O(log n). For the purposes of cost analysis, we assume a rotation takes 1 unit of time. Lemma 1 For a splay-step operation on x that transforms the rank function r into r� , the amortized cost is ai ≤ 3(r� (x) − r(x)) + 1 if the parent of x is the root, and ai ≤ 3(r� (x) − r(x)) otherwise. Proof of Lemma 1: Let the potential before the splay-step be φ and the potential after the splaystep be φ� . Let the worst case cost of the operation be ci . The amortized cost ai is ai = ci + φ� − φ. We consider the three cases of splay-step operations. Case 1: In this case, the parent of x is the root of the tree. Call it y. After the splay-step, x becomes the root and y becomes a child of x. The operation involves exactly one rotation, so ci = 1. The splay step only aﬀects the rank for x and y. Since y was the root of the tree and x is now the root of the tree, r� (x) = r(y). Additionally, since y is now a child of x, (the new) T (x) contains (the new) T (y), so r� (y) ≤ r� (x). Thus the amortized cost is: ai

=

ci + φ� − φ

=

1 + r� (x) + r� (y) − r(x) − r(y)

= 1 + r� (y) − r(x) ≤ 1 + r� (x) − r(x) ≤ 1 + 3(r� (x) − r(x)), since r� (x) ≥ r(x). Case 2: In this case, we perform two zigs or two zags, so ci = 2. Let the parent of x be y and the parent of y be z. Node x takes the place of z after the splay-step, so r� (x) = r(z). Also, we see in Figure 3 that r(y) ≥ r(x) (since y was the parent of x) and r� (y) ≤ r� (x) (since y is now a child of x). Then the amortized cost is: ai

=

ci + φ� − φ

=

2 + r� (x) + r� (y) + r� (z) − r(x) − r(y) − r(z)

=

2 + r� (y) + r� (z) − r(x) − r(y)

≤ 2 + r� (x) + r� (z) − r(x) − r(x). b ≤ log ( a+b Next, we use the fact that the log function is concave, or log a+log 2 2 ). If the splay-step � operation transforms the weight-sum function s into s , we have: � � log2 (s(x)) + log2 (s� (z)) s(x) + s� (z) ≤ log2 . 2 2

6 - Splay Trees-5

The left side is equal to

r(x)+r � (z) . 2

On the right side, note that s(x) + s� (z) ≤ s� (x);

indeed the old subtree T (x) and the new subtree T � (z) cover all nodes of T � (x), except y (thus s(x) + s� (z) = s� (x) − w(y)). Thus, we have: r(x) + r� (z) log2 (s� (x)) ≤ = r� (x) − 1, 2 2 or r� (z) ≤ 2r� (x) − r(x) − 2. Therefore, the amortized cost is: ai ≤ 2 + r� (x) + 2r� (x) − r(x) − 2 − r(x) − r(x) = 3(r� (x) − r(x)). Case 3: In this case, we perform a zig followed by a zag, or vice versa, so ci = 2. Let the parent of x be y and the parent of y be z. Again, r� (x) = r(z) and r(y) ≥ r(x). Then the amortized cost is: ai =

ci + φ� − φ

= 2 + r� (x) + r� (y) + r� (z) − r(x) − r(y) − r(z) ≤ 2 + r� (y) + r� (z) − r(x) − r(x). Note in Figure 4 that s� (y) + s� (z) ≤ s� (x). Using the fact that the log function is concave as before, we ﬁnd that r� (y) + r� (z) ≤ 2r� (x) − 2. Then we conclude ai ≤ 2 + 2r� (x) − 2 − r(x) − r(x) ≤ 2(r� (x) − r(x)) ≤ 3(r� (x) − r(x)). � Lemma 2 The amortized cost of the splay operation on a node x in a splay tree is O(1+log

s(root) s(x) ).

Proof of Lemma 2: The amortized cost a(splay(x)) of the splay operation is the sum of all of the splay-step operations performed on x. Suppose that we perform k splay-step operations on x. Let r0 (x) be the rank of x before the splay operation. Let ri (x) be the rank of x after the ith splay-step operation. Then we have rk (x) = r0 (root) and: a(splay(x)) ≤ 3(rk (x) − rk−1 (x)) + 3(rk−1 (x) − rk−2 (x)) + ... + 3(r1 (x) − r0 (x)) + 1 = 3(rk (x) − r0 (x)) + 1 = 3(r0 (root) − r0 (x)) + 1. The added 1 comes from the possibility of a case 1 splay-step at the end. The deﬁnition of r gives the result. � The above lemma gives the amortized cost of a splay operation, for any settings of the weights. To be able to get good bounds on the total cost of any sequence of operations, we set w(x) = 1 for all nodes x. This implies that s(root) ≤ n where n is the total number of nodes ever in the BST, and by Lemma 2, the amortized cost of any splay operation is a(splay(x)) = O(log n). 6 - Splay Trees-6

3.3

Amortized Cost of BST operations

We now need to show how to implement the various BST operations, and analyze their (amortized) cost (still with the weights set to 1). 3.3.1

Find

Finding an element in the splay tree follows the same behavior as in a BST. After we ﬁnd our node, we splay it, which is O(log n) amortized cost. The cost of going down the tree to ﬁnd the node can be charged to the cost of splaying it. Thus, the total amortized cost of Find is O(log n). (Note: if the node is not found, we splay the last node reached.) 3.3.2

Find-Min

This operation will only go down the left children, until none are left, and this cost will be charged to the subsequent splay operation. After we ﬁnd the min node, we splay it, which takes O(log n) amortized cost. The total amortized cost is then O(log n). 3.3.3

Find-Max

The process for this is the same as for Find-Min, except we go down the right child. The total amortized cost of this is O(log n) as well. 3.3.4

Join

Given two trees T1 and T2 with key(x) < key(y) ∀x ∈ T1 , y ∈ T2 , we can join T1 and T2 into one tree with the following steps: 1. Find-Max(T1 ). This makes the max element of T1 the new root of T1 . 2. Make T2 the right child of this. The amortized cost of the ﬁrst step is O(log n). For the second step, the actual cost is 1, but we need to take into account in the amortized cost the increase in the potential function value. Before step 2, T1 and T2 had a potential function value of φ(T1 ) and φ(T2 ). After it, the resulting tree has a potential function value ≤ φ(T1 ) + φ(T2 ) + log n, since the rank of the new root is ≤ log(n). So the amortized cost of Join is O(log n). 3.3.5

Split

Given a tree T and a pivot i, the split operation partitions T into two BSTs: T1 : {x | key(x) ≤ i}, T2 : {x | key(x) > i}. We split the tree T by performing Find(i). This Find will then splay on a node, call it x, which brings it to the root of the tree. We can then cut the tree; everything on the right of x belongs to

6 - Splay Trees-7

T2 , and everything on the left belongs to T1 . Depending on its key, we add x to either T1 or T2 . Thus, we either make the right child or the left child of x a new root by simply removing its pointer to its parent. The amortized cost of the Find operation is O(log n). The actual cost of creating the second BST (by cutting oﬀ one of the children) is just O(1), and the potential function does not increase (as the rank of the root does not increase). Thus the total amortized time of a Split is also O(log n) time. Join and Split make insertion and deletion very simple. 3.3.6

Insert

Let i be the value we want to insert. We can ﬁrst split the tree around i. Then, we let node i be the new root, and make the two subtrees the left and right subtrees of i respectively. The amortized cost again is O(logn). 3.3.7

Delete

To delete a node i from a tree T , we ﬁrst Find(i) in the tree, which brings node i to the root. We then delete node i, and are left with its left and right subtrees. Because everything in the left subtree has key less than everything in the right subtree, we can then join them. It is easy to see that this has amortized cost O(log n) as well. 3.3.8

Total cost of m operations

The next theorem shows that the cost of any sequence of operations on a splay tree has worst-case time similar to any balanced BST (unless the number of operations m is o(n) where n is the number of keys). Theorem 3 For any sequence of m operations on a splay tree containing at most n keys, the total cost is O((m + n) log n). Proof of Theorem 3: Let ai be the amortized cost of the ith operation. Let ci be the real cost of the ith operation. Let φ0 be the potential before and φm be the potential after the m operations. The total amortized cost of m operations is: m �

ai =

i=1

Then we have:

m � i=1

m �

ci + φm − φ0 .

i=1

ci =

m �

ai + φ0 − φm .

i=1

Since we chose w(x) = 1 for all x, we have that, for any node x, r(x) ≤ log n. Thus φ0 −φm ≤ n log n, so we conclude: m � i=1

ci =

m �

ai + O(n log n) = O(m log n) + O(n log n) = O((m + n) log n).

i=1

� 6 - Splay Trees-8

4 4.1

Comparison to other BSTs Static Optimality Property

We will show that splay trees are competitive against any binary search tree that does not involve any rotations. We consider BSTs containing n keys, and sequences of operations that contain only Find operations (thus, no Insert or Delete for example). Theorem 4 Deﬁne a static binary search tree to be one that uses no rotation operations. Let mi be the number of times element i is accessed for i = 1, . . . , n. We assume mi ≥ 1 for all i. Then the total cost for accessing every element i mi times is at most a constant times the total cost of any static binary search tree. Proof of Theorem 4: Consider any binary search tree T rooted at t. Let l(i) be the height of of i in T , or the number of nodes on the path from i to the root of T , so l(t) = 1. In T , the cost � for accessing an element i is l(i), so the total cost for accessing every element i mi times is l(i)mi . We want to show that the total cost of operations on a splay tree, irrespective of the i � starting conﬁguration, is O( l(i)mi ). i

We choose a diﬀerent weight function that earlier. Here, we deﬁne the weights to be w(i) = 3−l(i) for all i. Note that s(t) ≤ 31 + 2( 312 ) + 22 ( 313 ) + . . . = 1. Then, by Lemma 2, the amortized cost of ﬁnding i is: s(t) 1 a(i) = O(1 + log2 ) = O(1 + log2 −l(i) ) = O(1 + l(i)). s(i) 3 The total amortized cost of accessing every element i mi times on a splay tree is thus: � � � � O(m + l(i)mi ) = O l(i)mi . i

i

This is the amortized cost, we now need to argue about the actual cost. Let φ be the potential before the beginning of the sequence, and φ’ be the potential after the sequence of operations. For a node i, let r(i) be the rank of i before and r� (i) be the rank after the operations. Note that (since r(i) ≤ log2 1 and r� (i) ≥ log2 w(i)): � � � � � 1 � � φ−φ = r(i) − r (i) ≤ log2 −l(i) = O l(i) . 3 i i i Then we have: �

ci =

�

�

ai + φ − φ = O

� �

� l(i)mi

+O

� �

i

since our assumption mi ≥ 1 implies that

4.2

� l(i)

i

�

i

l(i) ≤

�

i

l(i)m(i).

=O

� �

� l(i)mi

,

i

�

Dynamic Optimality Conjecture

The Dynamic Optimality Conjecture claims that Splay Trees are eﬃcient up to a constant factor to any self-adjusting Binary Search Tree (allowing an arbitrary number of (arbitrary) rotations between accesses). This conjecture was ﬁrst put forth in the Tarjan and Sleater’s original Splay Tree paper in 1985, and has withstood attempts to prove or disprove it since. 6 - Splay Trees-9

4.3

Scanning Theorem

The scanning theorem states that, for a splay tree that contains the values [1, 2, . . . , n], accessing all of those elements in sequential order takes O(n) time, regardless of the initial arrangement of the tree. An interesting point is that, even though the Scanning Theorem has been proved, if the Dynamic Optimality Conjecture were true, then it would follow directly from the fact that one can create dynamic BST’s that perform sequential access in linear time.

6 - Splay Trees-10

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

September 29, 2008

Lecture 7 - Dynamic Trees Lecturer: Michel X. Goemans

1

Overview

In this lecture, we discuss dynamic trees, a sophisticated data structure introduced by Sleator and Tarjan. Dynamic trees allow to provide the fastest worst-case running times for many network ﬂow algorithms. In particular, it will allow us to eﬃciently perform the Cancel operation in the Cancel and Tighten algorithm. Dynamic trees build upon splay trees, which we introduced in the previous lecture. Dynamic trees manage a set of node-disjoint (not necessarily binary) rooted trees. With each node v is associated a cost. In our use of dynamic trees, the cost will be coming from the edge (p(v), v), where p(v) denotes the parent of v; the cost of the root in that case will be set arbitrarily large (larger than the cost of any other node), say +∞.

Figure 1: Example of Dynamic Tree. Dynamic trees will support the following operations: • make-tree(v): Creates a tree with a single node v, whose cost is +∞. • find-root(v): Finds and returns the root of the tree containing the node v. • find-cost(v): Returns the cost of node v. (This may sound like a trivial operation, but in fact there is real work to be done, because we will not explicitly maintain the costs of all nodes.) • find-min(v): Finds and returns the ancestor of w of v with minimum cost. Ties go to the node closest to the root. • add-cost(v, x): Adds x to the cost of all nodes w on the path from find-root(v) to v. • cut(v): Breaks the rooted tree in two by removing the link to v from its parent. The node v is now the root of a new tree, and its cost is set to +∞. • link(v, w, x): Assumes that (1) w is a root, and (2) v and w are not in the same tree, i.e. find-root(v) 6= w. Combines two trees by adding an edge (v, w), i.e. p(w) = v. Sets the cost of w equal to x. We will later show that all of these operations run in O(log n) amortized time.

7 - Dynamic Trees-1

v

v

Figure 2: cut(v) operation. LINK

W

V COST(W) = X

Figure 3: link(v, w, x) operation. Theorem 1 The total running time of any sequence of m dynamic tree operations is O((m + n) log n), where n is the number of nodes. We defer the proof of this theorem until the next lecture.

2

Implementation of Cancel with dynamic trees

Recall the setting for the Cancel step in the algorithm Cancel and Tighten for the minimum cost ﬂow problem. We have a circulation f and node potentials p in an instance deﬁned on graph G. Recall that an edge (v, w) is admissible if cp (v, w) < 0, and the admissible graph (V, Ea ), is the subgraph of Ef (the residual graph corresponding to our circulation) containing only the admissible edges. Our aim is to repeatedly ﬁnd a cycle in the admissible graph and saturate it. Each time we do this, all of the saturated edges disappear from the graph. Also recall that no edges are added to the admissible graph during this process, because any new edge in the residual graph must have positive reduced cost and are therefore is not admissible. We represent the problem with dynamic trees, where the nodes in the dynamic trees correspond to nodes in G and the edges of the dynamic trees are a subset of the admissible edges. We maintain two (disjoint) sets of admissible edges: those which are currently in the dynamic tree, and those which still need to be considered. The cost of a node v will correspond to the residual capacity uf (p(v), v) of the edge (p(v), v), unless v is a root node, in which case it will have cost +∞. We will also mark some of the roots (denoted graphically with a (∗)) to indicate that we dealt with them and concluded they can’t be part of any cycle. For the edges not in the dynamic tree, we also maintain the ﬂow value. (We don’t need to maintain the ﬂow explicitly for the edges in the trees, since we can recover the ﬂow from the edge capacities in G and the residual capacity.) To summarize, we begin with a set of n singleton trees. All of the edges start out in the remaining pool. In each iteration, we try to ﬁnd an admissible edge leading to the root r of one of the dynamic trees. If we fail to ﬁnd such an edge, this implies there are no admissible cycles which include r,

7 - Dynamic Trees-2

and so we mark it and remove it from consideration. Suppose, on the other hand, that we do ﬁnd an edge (w, r) leading into the root. If w is in a diﬀerent tree, we join the two trees by adding an edge connecting w and r. On the other hand, if w and r are part of the same tree, it means we have found a cycle. In this case, we push ﬂow along the cycle and remove the saturated edges from the data structure. In more detail, we keep repeating the following procedure as long as there still exist unmarked roots: ⊲ Choose an unmarked root r. ⊲ Among admissible edges, try to ﬁnd one which leads to r. ⊲ CASE 1: there is no such (v, r) ∈ Ea . ⊲ Mark r, since we know it cannot possibly be part of a cycle. ⊲ Cut all the children v of r. ⊲ Set f (r, v) ← u(r, v) − uf (r, v) = u(r, v) − find-cost(v) ⊲ CASE r. ⊲ ⊲ CASE r. ⊲

2: there is an admissible edge (w, r) from a diﬀerent tree, i.e. find-root(w) = 6 Link the two trees: link(w, r, u(w, r) − f (w, r)) 3: there is an admissible edge (w, r) from the same tree, i.e. find-root(w) = We’ve found a cycle, so push ﬂow along the cycle. The amount we can push is δ = min(u(w, r) − f (w, r), find-cost(find-min(w)))

add-cost(w, −δ)

Increase f (w, r) by δ

If f (w, r) = u(w, r), then (u, r) is inadmissible, so we get rid of it.

While find-cost(find-min(w)) = 0:

⊲ z ← find-min(w)

⊲ f (p(z), z) ← u(p(z), z)

⊲ cut(z)

The last while loop is to delete all the edges that became inadmissible along the path from r to w. ⊲ ⊲ ⊲ ⊲

2.1

Running time

In a cancel step, we end up cancelling at most O(m) cycles, where m is the number of edges. In addition, each edge gets saturated at most once (if it does, it becomes inadmissible); therefore the number of cut(z) and find-min(w) over all cases 3 is O(m). Thus the total number of dynamic tree (and also other arithmetic or control) operations is at most O(m). Hence, by Theorem 1, the running time of each Cancel operation is O((m + n) log n) = O(m log n). The overall running time of Cancel-and-Tighten is therefore O(m2 n log2 n) (strongly polynomial running time bound) or O(mn log n log(nC)).

3

Dynamic trees implementation

We now turn to the implementation of dynamic trees. Here we present the deﬁnitions; we will cover the running time analysis in the next lecture. The dynamic trees data structure is a collection of rooted trees. We decompose each rooted tree into a set of node-disjoint (directed) paths, as shown in Figure 4. Each node is in precisely one path (possibly containing that node only). We will refer 7 - Dynamic Trees-3

Figure 4: Decomposition of rooted tree. to the edges on these paths as solid edges, and we will refer to the remaining edges as dashed edges, or middle edges. Each path is directed from its tail (highest in the tree) to its head lowest in the tree). There are many possible ways to partition a tree into solid paths. For instance, if we are given a solid edge and a dashed edge which are both children of a single parent, we can swap the solid and dashed edges. This follows from the basic observation that, for any middle edge (v, w), w is the tail of a solid path. This operation is known as splicing as shown in Figure 5.

Splicing

Figure 5: Splicing in the rooted tree. In a dynamic tree, each solid path is represented by a splay tree, where the nodes are sorted in increasing order from the head to the tail, as shown in Figure 6. In other words, the node with smallest key is the head (the lowest in the tree), and the node with largest key is the tail (the highest in the tree) In addition, we will maintain links between the diﬀerent splay trees. The root of each splay tree is attached to the parent of the tail of the path in the rooted tree, as shown in Figure 7. For example, the edge (e, f ) in the original rooted tree becomes the edge (e, i) linking e to the root i of the splay tree corresponding to the solid path f → i. The entire data structure — with the splay trees corresponding to the same rooted tree being connected to each other — forms what is called a virtual tree. Any given node of the virtual tree may have at most one left child and at most one right child (of a splay tree), as well as any number of children attached by dashed edges. Children attached by dashed edges are known as middle children, and we draw them in between the left and right children. Notice that we can reconstruct the rooted tree from the virtual tree. Each splay tree corresponds to a solid path from the node of lowest key to the node of highest key. In addition, for any middle 7 - Dynamic Trees-4

TAIL

f

e c d b

e

c a

d

f

b

HEAD

a

Figure 6: Representation of solid path from head to tail in BST (Splay Tree). d e b

a

c

e

f

d

i a

b

c

g

i f

h g

h

Figure 7: Rooted tree on the left and corresponding virtual tree on the right. edge, we get an edge of the original rooted tree; for example, to (e, i) in the virtual tree, corresponds the edge (e, f ) in the original tree where f is the node with highest key in the splay tree in which i resides. Note that there are many diﬀerent ways to represent rooted trees as virtual trees, and we can modify virtual trees in various ways which don’t aﬀect the rooted trees. In particular, we deﬁne the Expose(v) operation, which brings a given node v to the root of the virtual tree. This operation involves three main steps: 1. Make sure that the path from v to the root only uses roots of splay trees. This can be done by performing splay operations whenever we enter a new splay tree. 2. Make sure that the path from v to the root consists entirely of solid edges. We can ensure this through repeated splicing. 3. Do the splay operation to bring v to the top of the resulting splay tree. This is justiﬁed since v is now in the same splay tree as the root of the original rooted tree.

7 - Dynamic Trees-5

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

October 1, 2008

Lecture 8 Lecturer: Michel X. Goemans Previously, we introduced the dynamic tree data structure and the operations that dynamic trees must support. Today, we take a more detailed look at dynamic trees and describe the eﬃcient implementation of the operations. In doing so, much of our focus will be on the Expose method, an extended splay operation that is essential in all these operations. We show that any sequence of m operations on a dynamic tree with n nodes takes O((m + n) log n) time.

1

Dynamic Trees

Dynamic trees (also known as link-cut trees) introduced by Sleator and Tarjan are a data structure intended to maintain a representation of a set of rooted trees. We will be able to perform various operations on these trees, to be discussed later. Figure 1 shows an example tree as a virtual tree (left) and a rooted tree (right).

1.1

Rooted Trees

We view rooted trees as unions of node-disjoint (directed) paths. This divides the edges of the tree into two sets. Solid edges are those that are on the node-disjoint paths that the tree is composed of, and dashed edges are those that are not on these paths. Note that each path consisting of solid edges is a directed path (we omit the arrows here) from top to bottom.

1.2

Virtual Trees

The union of disjoint paths described above can be used to represent virtual trees. In a virtual tree, each solid path is represented by a splay tree such that the following conditions hold: • A successor node in a splay tree is an ancestor in the rooted tree. • For each splay tree, its largest node is linked to the parent of the root in the rooted tree. • In the virtual tree, each node has at most one left child, at most one right child, and any number of middle (virtual) children. There are three kinds of edges in a virtual tree, corresponding to the three types of children a node can have. Left and right children of a node are connected to the node by solid edges, and middle children of a node are connected to it by dashed edges. Note that there can be many virtual trees corresponding to a rooted tree, because there are two diﬀerent degrees of freedom involved in constructing a virtual tree — the union of disjoint paths could be diﬀerent, as could the structure of the splay trees corresponding to the paths. An important consequence of this setup is that rotations in a splay tree do not aﬀect the structure of the rooted tree.

2

The Expose Operation

The Expose(v) operation is an extended splay operation that brings v to the root of the virtual tree without changing the structure of the rooted tree. The important parts of this operation are to

8-1

Figure 1: Virtual tree (left) and corresponding rooted tree (right). make sure that the path from v to the root is solid and that the splay tree representing the path to which v belongs is rooted at v. We can describe this operation in three steps. In our example, we run Expose on node 15.

2.1

Step 1

Step 1 consists of walking from v to the root of the virtual tree. Whenever the walk enters a splay tree (solid edges) at some node w, a Splay(w) operation is performed, bringing w to the root of that tree. Middle children are not aﬀected in this step. For instance, we splay nodes 11 and 5 in our example tree as in ﬁgure 2. Note that at the end of step 1 of an Expose(v) operation, v will be connected to the root of the virtual tree only by dashed edges.

2.2

Step 2: Splicing

Step 2 consists of walking from v to the root of the virtual tree exchanging along the way each middle edge with the left subtree of the parent. This is illustrated in Figure 3 and called splicing. A middle child of a node w and its left child can be exchanged (without changing the rooted tree) only if w is the root of its splay tree. This justiﬁes our execution of step 1 ﬁrst since at the end of step 1 all edges from v to the root are middle edges. Splicing is a valid operation on virtual trees. Indeed, referring to Figure 3, the left subtree of w in the splay tree corresponds to the part of the solid path that is below w in the rooted tree; this is because w is the root of its splay tree. Exchanging that solid subpath with the solid path corresponding to the splay tree rooted at v still leaves the rooted tree decomposed into a node-disjoint union of paths. Note that after performing this operation on every edge to the root of the virtual tree, there will be a solid path from the root of the rooted tree to the node being exposed.

8-2

Figure 2: Walking Up and Splaying. The virtual tree after splaying 15 and 11 is shown on the left. The virtual tree on the right is at the end of step 1, after splaying also node 5.

Figure 3: Splicing. w needs to be the root of its splay tree.

8-3

Figure 4: Left virtual tree is after ﬁrst splicing, the right virtual tree is the one at the end of step 2. The result of splicing every node on the path to the root for our example is illustrated in Figure 4.

2.3

Step 3

Step 3 consists of walking from v to the root in the virtual tree, splaying v to the root. Note that in the analysis, we can charge the entire cost of step 2 to the ﬁnal splaying operation in step 3. Figure 5 shows the relevant splay tree before and after this step.

3

Operations on Dynamic Trees

We will now describe the desired operations on a dynamic tree and how to implement them eﬃciently using the Expose method just deﬁned. Some of these operations require keeping track of diﬀerent costs in the tree, so we ﬁrst consider an eﬃcient way of doing this.

3.1

Maintaining Cost Information

When performing operations on the dynamic tree, we need to keep track of cost(x) for each node x, and we need to be able to ﬁnd the minimum cost along paths to the root of the rooted tree. If such a path is the preﬁx of a path corresponding to a splay tree, it seems that, knowing the minimum cost in any subtree of any our splay trees might be helpful. So, in addition to cost(x), we would like to keep track of the value mincost(x), given by mincost(x) = min{cost(y) | y in the subtree rooted at x of x’s splay tree}. We’ll see that, instead of maintaining cost(x) and mincost(x), that it will be easier to maintain the following two quantities for every node x: Δ min(x) = cost(x) − mincost(x) 8-4

Figure 5: Splaying on Virtual Tree.

v

w

a

a

c

v b

w b

c

Figure 6: Rotation. and

� Δ cost(x) =

cost(x) cost(x) − cost(p(x))

if x is the root of a splay tree, otherwise.

Observe that, if x is the root of a splay tree, then cost(x) = Δ cost(x) and mincost(x) = Δ cost(x) − Δ min(x). This fact, combined with the Expose operation, shows that we can ﬁnd cost(x) and mincost(x) given Δ min(x) and Δ cost(x), so it is suﬃcient to maintain the latter. We now claim that we can update Δ min(x) and Δ cost(x) in O(1) time after a rotation or a splice, which will allow us to maintain cost(x) and mincost(x) in O(1) time. We ﬁrst consider a rotation, see Figure 6 for the labelling of the nodes. Let Δ cost(x) and Δ cost� (x) correspond to before and after the rotation, respectively. Similarly deﬁne Δ min(x) and Δ min� (x). Observe that during a rotation, only the nodes b, w and v have their Δ cost(x) change. One can check that the updates are as follows: Δ cost� (v)

= = Δ cost� (w) = Δ cost� (b) =

Δ cost(w) + (cost(v) − cost(w)) Δ cost(w) + Δ cost(v), −Δ cost(v), Δ cost(b) + (cost(v) − cost(w)) = Δ cost(b) + Δ cost(v).

Before showing the corresponding updates for Δ min(x), observe that Δ min(x) and Δ cost(x)

8-5

satisfy the following equation; here x is any node and l is its left child and r is its right child: Δ min(x) = cost(x) − mincost(x) = cost(x) − min(cost(x), mincost(l), mincost(r)) = max(0, cost(x) − mincost(l), cost(x) − mincost(r)) = max(0, Δ min(l) − Δ cost(l), Δ min(r) − Δ cost(r)).

(1)

Furthermore, the minimum of the subtree can be located by knowing which term attains the maxi mum in the last expression. Back to the updates for Δ min(x). The only subtrees that change are those of w and v, and so only those Δ min values change. Using (1), one can see that Δ min� (w) = max(0, Δ min(b) − Δ cost� (b), Δ min(c) − Δ cost(c)) Δ min� (v) = max(0, Δ min(a) − Δ cost(a), Δ min� (w) − Δ cost� (w)). Notice that Δ min� (v) depends on Δ min� (w) that was just computed. Similar when we perform the splicing step given in Figure 3, Δ cost only change for v and u and only Δ min(w) changes. The updates are: Δ cost� (v) = Δ(cost(v)) − Δ(cost(w)), Δ cost� (u) = Δ cost(u) + Δ cost(w), Δ min� (w) = max(0, Δ min(v) − Δ cost� (v), Δ min(z) − Δ cost(z)).

3.2

Implementation of Operations

We now describe the implementation of each of the desired operations on a dynamic tree, making extensive use of the Expose operation. • make-tree(v)

Simply create a tree with the single node v.

• find-root(v) First, run Expose(v). Then follow right children until a leaf w of the splay tree containing v is reached. Now, splay(w), and then return w. • find-cost(v) First, run Expose(v). Now v is the root, so return Δ cost(v) = cost(v). Note that the actual computations here were done by the updates of Δ cost(v) and Δ min(x) within the splay and splice operations. • find-min(v)

First, run Expose(v). Now, let’s rewrite (1):

Δ min(v) = max{0, −Δ cost(left(v)) + Δ min(left(v)), −Δ cost(right(v)) + Δ min(right(v))}. If Δ min(v) = 0, then splay(v) and then return v, as the minimum is achieved at v. Else, if −Δ cost(left(v)) + Δ min(left(v)) > −Δ cost(right(v)) + Δ min(right(v)), then the minimum is contained in the left subtree and we walk down it recursively. Otherwise, the minimum is contained in the right subtree, so we recurse down the right. Once we have found the minimum, we splay it. 8-6

• add-cost(v, x) First, run Expose(v). Add x to Δ cost(v) and subtract x from Δ cost(left(v)). Also update Δ min(v) (using (1)). (The Δ min value of other nodes is unchanged.) • cut(v) First, run Expose(v). Add Δ cost(v) to Δ cost(right(v)). Remove the edge (v, right(v)). • link(v, w, x) First, run Expose(v) and Expose(w). Then, add the root w as a middle child of v. Add Δ cost(w) − x to Δ cost(right(v)) and to Δ cost(left(v)). Also update Δ min(w).

4

Analysis of Dynamic Trees

We now give an amortized analysis of cost of operations in these dynamic trees. We will see that any sequence of m dynamic tree operations on n nodes will take O((m + n) log n) time.

4.1

Potential Function

We will use the following potential function in our analysis, motivated by our analysis of splay trees. For each node x, let w(x) = 1 be the weight assigned to x, and deﬁne � s(x) = w(y), y∈Tx

where Tx is the entire virtual tree subtree attached at x. Then, consider r(x) = log2 s(x) and take our ﬁnal potential function to be � φ(T ) = 3 r(x). x∈T

This diﬀers from the potential function for splay trees in 2 ways. First Tx is deﬁned over the entire virtual tree and secondly we have this additional factor 3. We will see later why the constant factor of 3 was chosen here.

4.2

Runtime of the Expose Operation

We ﬁrst analyze the runtime of Expose(v), since it is used in all other operations. We look at each step of Expose(v) separately. Let k be the number of middle edges separating v from the root of the entire virtual tree. Equivalently, k is the number of splay operations performed during Step 1. • Step 1: Let t(v) be the root of the splay tree containing v. Recall that the amortized cost of splay(v) was 3(r(t(v)) − r(v)) + 1 when we used the potential function � φsplay (T ) = r(x). x∈T

We now have the potential function φ(T ) = 3φsplay (T ), so the 3(r(t(v)) − r(v)) term here should be multiplied by 3 to obtain an amortized runtime of 9(r(t(v)) − r(v)) + 1 for each call of splay(v) (the +1 corresponds to the cost of the last zig, if any, and so we do not need to multiply it by 3).

8-7

We are using the splay operation on the k nodes v, p(t(v)), . . . , (p ◦ t)k−1 (v) in this step, meaning that we get a total amortized runtime of k −1 �

� � 9 r(t((p ◦ t)i (v))) − r((p ◦ t)i (v)) + 1 ≤ 9[r(root) − r(v)] + k,

i=0

since we have that r(t(p ◦ t)i−1 (v)) ≤ r((p ◦ t)i (v)), so the sum telescopes. The amortized cost of step 1 is therefore O(log n) + k (since r(root) − r(v) ≤ log n). • Step 2: Splicing does not change the value of φ(T ), so the amortized cost for this step is the same as its actual cost of k. • Step 3: We are using the splay operation once on node v at distance k from the root, so this has an actual cost of k. Using the fact that our potential φ has an additional factor 3 in its deﬁnition compared to the splay tree version, we get from the amortized analysis of splaying that: 1 k + Δφ(T ) ≤ 3[r(root) − r(v)] + 1 = O(log n). 3 Multiplying by 3, we see that we can also account for the additional cost of 2k from steps 1 and 2, and have an amortized time of O(log n). • Total: We get O(log n) + k in step 1, k in step 2, and these 2k plus step 3 gives O(log n), for a total of O(log n).

4.3

Runtimes of all Operations

We can now brieﬂy summarize the runtimes of all other operations in terms of Expose. • find-cost, find-root, find-min, add-cost Each of these operations requires at most one use of Expose, at most one run of splay, and at most one search of the tree which can be charged to the last splay. Therefore, they each run in O(log n) amortized time. • cut We again use Expose once. We now consider the eﬀect of the other actions on the poten tial function. Removing the edge (v, right(v)) decreases s(v) by s(right(v)) and leaves s(x) unchanged for all other x, so it decreases φ(T ), which we can safely ignore. This gives an amortized runtime of O(log n). • link We use Expose twice. Now, when we link w to v, we see that r(v) increases by O(log n), and all other r(x) remain unchanged. Hence, this operation increases φ(T ) by O(log n), giving a total amortized runtime of O(log n). With this analysis, we see that every operation has amortized time O(log n). A sequence of m operations has therefore amortized time O(m log n). Furthermore, the potential function satisﬁes � � φ(T ) = r(x) ≤ log n ≤ n log n, x∈T

x∈T

meaning that any increase in potential is at most O(n log n), implying that the total cost is at most O((m + n) log n). We now have the following theorem. Theorem 1 Any m operations on a dynamic tree with n nodes run in O((m + n) log n) time.

8-8

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

October 6, 2008

Lecture 9 Lecturer: Michel X. Goemans

9

Linear Programming

Linear programming is the class of optimization problems consisting of optimizing the value of a linear objective function, subject to linear equality or inequality constraints. These constraints are of the form a1 x1 + · · · + an xn

{≤, =, ≥} b,

where ai , b ∈ R, and the goal is to maximize or minimize an objective function of the form c1 x1 + · · · + cn xn . In addition, we constrain the variables xi to be nonnegative. The problem can be expressed in matrix form. Given these constraints Ax {≤, =, ≥} b x ≥ 0, maximize or minimize the value of cT x, where x ∈ Rn , A ∈ Rm×n , b ∈ Rm , c ∈ Rn . Linear programming has many applications and can also be used as a proof technique. In addition, it is important from a complexity point-of-view, since it is among the hardest of the class of polynomial-time solvable problems.

9.1

Algorithms

Research in linear programming algorithms has been an active area for over 60 years. In this class, we will discuss three major (classes of) algorithms: • Simplex method (Dantzig 1947). – Fast in practice. – Still the most-used LP algorithm today. – Can be nonpolynomial (exponential) in the worst case. • Ellipsoid algorithm (Shor, Khachian 1979). – Polynomial time; this was the ﬁrst polynomial-time algorithm for linear programming. – Can solve LP (and other more general) problems where the feasible region P = {x : Ax = b, x ≥ 0} is not explicitly given, but instead, given a vector x, one can eﬃciently decide whether x ∈ P or if not, ﬁnd an inequality satisﬁed by P but not by x. – Very useful for designing polynomial time algorithms for other problems. – Not fast in practice. 9-1

• Interior-point algorithms (Karmarkar 1984). – This is a class of algorithms which maintain a feasible point in the interior of P ; many variants (by many researchers) have been developed. – Polynomial time. – Fast in practice. – Can beat the simplex method for larger problems.

9.2

Equivalent forms

A linear programming problem can be modiﬁed to ﬁt a preferred alternate form by changing the objective function and/or the linear constraints. For example, one can easily transform any linear program into teh standard form: min{cT x : Ax = b, x ≥ 0}. One can use the following simple transformations. max{cT x} −→

Maximize to minimize Equality to inequality

aTi x = bi

−→

Inequality to nonnegativity constraint

aTi x ≤ bi

−→

Variables unrestricted in sign

9.3

xj unrestricted in sign −→

T min{−c x} � T

ai x ≤ bi

T � aiT x ≥ bi ai x + s = bi (s ∈ Rn ) s ≥ 0 ⎧ + − ⎨ replace xj everywhere by xj − xj + x ≥0 ⎩ j− xj ≥ 0

Deﬁnitions

Here is some basic terminology for a linear program. Deﬁnition 1 A vector x is feasible for an LP if it satisﬁes all the constraints. Deﬁnition 2 An LP is feasible if there exists a feasible solution x for it. Deﬁnition 3 An LP is infeasible if there is no feasible solution x for it. Deﬁnition 4 An LP min{cT x : Ax = b, x ≥ 0} is unbounded if, for all λ ∈ R, ∃x ∈ Rn such that Ax = b x≥0 cT x ≤ λ.

9.4

Farkas’ lemma

If we have a system of equations Ax = b, from linear algebra, we know that either Ax = b is � 0 is solvable. Indeed, since Im(A) = ker(AT )⊥ , either b solvable, or the system AT y = 0, bT y = T is orthogonal to ker(A ) (in which case it is in the image of A, i.e. Ax = b is solvable) or it is not orthogonal to it in which case one can ﬁnd a vector y ∈ ker(AT ) with a non-zero inner product with b (i.e. AT y = 0, bT y �= 0 is solvable). Farkas’ lemma generalizes this when we have also linear inequalities: Lemma 1 ((Farkas’ lemma)) Exactly one of the following holds: 1. ∃x ∈ Rn : Ax = b, x ≥ 0, 9-2

2. ∃y ∈ Rm : AT y ≥ 0, bT y < 0. Clearly, both cannot simultaneously happen, since the existence of such an x and a such a y would mean: yT Ax = yT (Ax) = y T b < 0, while yT Ax = (AT y)T x ≥ 0, as the inner product of two nonnegative vectors is nonnegative. Together this gives a contradiction. 9.4.1

Generalizing Farkas’ Lemma

Before we provide a proof of the (other part of) Farkas’ lemma, we would like to brieﬂy mention other possible generalizations of the solvability of system of equations. First of all, consider the case in which we would like the variables x to take integer values, but don’t care whether they are nonnegative or not. In this case, the natural condition indeed is necessary and suﬃcient. Formally, suppose we take this set of constraints: Ax

=

b

x

∈

Zn

Then if yT Ax = yT b, and we can ﬁnd some yT A ∈ Zn and some yT b that is not integral, then the system of constraints is infeasible. The converse is also true. Theorem 2 Exactly one of the following holds: 1. ∃x ∈ Zn : Ax = b, 2. ∃y ∈ Rm : AT y ∈ Zn and bT y ∈ / Z. One could try to combine both nonnegativity constraints and integral restrictions but in that case, the necessary condition for feasibility is not suﬃcient. In fact, for the following set of constraints: Ax

=

b

x ≥

0

x

∈

Zn ,

determining feasibility is an NP-hard problem, and therefore we cannot expect a good characteriza tion (a necessary and suﬃcient condition that can be checked eﬃciently). 9.4.2

Proof of Farkas’ lemma

We ﬁrst examine the projection theorem, which will be used in proving Farkas’ lemma (see Figure 1). Theorem 3 (The projection theorem) If K is a nonempty, closed, convex set in Rm and b ∈ / K, deﬁne p = projK (b) = arg min �z − b�2 . (1) z∈K

T

Then, for all z ∈ K : (z − p) (b − p) ≤ 0.

9-3

Figure 1: The projection theorem. Proof of Lemma 1: We have seen that both systems cannot be simultaneously solvable. So, now assume that �x : Ax = b, x ≥ 0 and we would like to show the existence of y satisfying the required conditions. Deﬁne K = {Ax : x ∈ Rn , x ≥ 0} ⊆ Rm . By assumption, b ∈ / K, and we can apply the projection theorem. Deﬁne p = projK (b). Since p ∈ K, we have that p = Ax for some vector x ≥ 0. Let y = p − b ∈ Rm . We claim that y satisﬁes the right conditions. Indeed, consider any point z ∈ K. We know that ∃w ≥ 0 : z = Aw. By the projection theorem, we have that (Aw − Ax)T y ≥ 0, i.e. (w − x)T AT y ≥ 0,

(2)

for all w ≥ 0. Choosing w = x + ei (where ei is the ith unit vector), we see that AT y ≥ 0. We still need to show that bT y < 0. Observe that bT y = (p − y)T y = pT y − yT y < 0 because pT y ≤ 0 and yT y > 0. The latter follows from y �= 0 and the former from (2) with w = 0: −xT AT y ≥ 0, i.e. −pT y ≥ 0. � 9.4.3

Corollary to Farkas’ lemma

Farkas’ lemma can also be written in other equivalent forms. Corollary 4 Exactly one of the following holds: 1. ∃x ∈ Rn : Ax ≤ b, 2. ∃y ∈ Rm : y ≥ 0, AT y = 0, bT y < 0. Again, x and y cannot simultaneously exist. This corollary can be either obtained by massaging Farkas’ lemma (to put the system of inequalities in the right form), or directly from the projection theorem.

9.5

Duality

Duality is one of the key concepts in linear programming. Given a solution x to an LP of value z, how do we decide whether or not x is in fact an optimum solution? In other words, how can we calculate a lower bound on min cT x given that Ax = b, x ≥ 0? 9-4

Suppose we have y such that AT y ≤ c. Then observe that yT b = yT Ax ≤ cT x for any feasible solution x. Thus yT b provides a lower bound on the value of our linear program. This conclusion is true for all y satisfying AT y ≤ c, so in order to ﬁnd the best lower bound, we wish to maximize yT b under the constraint of AT y ≤ c. We can see that this is in fact itself another LP. This new LP is called the dual linear program of the original problem, which is called the primal LP. • Primal LP: min cT x, given Ax = b, x ≥ 0, • Dual LP: max bT y, given AT y ≤ c. 9.5.1

Weak Duality

The argument we have just given shows what is known as weak duality. Theorem 5 If the primal P is a minimization linear program with optimum value z, then it has a dual D, which is a maximization problem with optimum value w and z ≥ w. Notice that this is true even if either the primal or the dual is infeasible or unbounded, provided we use the following convention:

9.5.2

infeasible min. problem −→ unbounded min. problem −→

value = +∞ value = −∞

infeasible max. problem

−→

value = −∞

unbounded max. problem

−→

value = +∞

Strong Duality

What is remarkable is that one even has strong duality, namely both linear programs have the same values provided at least one of them is feasible (it can happen that both the primal and the dual are infeasible). Theorem 6 If P or D is feasible, then z = w. Proof: We assume that P is feasible (the argument if D is feasible is analogous; or one could also argue that the dual of the dual is the primal and therefore one can exchange the roles of primal and dual). If P is unbounded, z = −∞, and by weak duality, w ≤ z. So it must be that w = −∞ and thus z = w. Otherwise (if P is not unbounded), let x∗ be the optimum solution to P, i.e.: cT x∗

z = Ax∗ x∗

= b ≥ 0

We would like to ﬁnd a dual feasible solution with the same value as (or no worse than) x∗ . That is, we are looking for a y satisfying: AT y

≤ c

T

≥ z

b y

If no such y exists, we can use Farkas’ lemma to derive: ∃x ∈ Rn , x ≥ 0, and ∃λ ∈ R, λ ≥ 0 : Ax − λb = 0 and cT x − λz < 0. We now consider two cases. 9-5

• If λ = � 0, we can scale by λ, and therefore assume that λ = 1. Then we get that ⎧ ⎨ Ax = b, x≥0 ∃x ∈ Rn : ⎩ T c x < z. This result is a contradiction, because x∗ was the optimum solution, and therefore we should not be able to further minimize z. • If λ = 0 then ∃x ∈ Rm

⎧ ⎨ x≥0 Ax = 0 : ⎩ T c x < 0.

Consider now x∗ + µx for any µ > 0. We have that x∗ + µx ≥ 0 A(x∗ + µx) = Ax∗ + µAx = b + 0 = b.

Thus, x∗ + µx is feasible for any µ ≥ 0. But, we have that cT (x∗ + µx) = cT x∗ + µcT x < z, a contradiction. �

9-6

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

October 8, 2008

Lecture 10 Lecturer: Michel X. Goemans Last lecture we introduced the basic formulation of a linear programming problem, namely the problem with the objective of minimizing the expression cT x (where c ∈ Rn , x ∈ Rn ) subject to the constraints Ax = b where A ∈ Rmxn , b ∈ Rm ) and x ≥ 0. We then introduced the dual linear program, with the objective of maximizing bT y, subject to the constraints that AT y ≤ c. Eventually, we were able to relate the two forms via the Theorem of Strong Duality, which states that if either the primal or the dual has a feasible solution then their values are equal: w := min{cT x : Ax = b, x ≥ 0} = max{bT y : AT y ≤ c} =: z. Today, we further explore duality by justifying the Theorem of Strong Duality via a physical argument, introducing rules for constructing dual problems for non-standard linear programming formulations, and further discussing the notion of complementary slackness mentioned in the last lecture. We then shift gears and discuss the geometry of linear programming, which leads us to the Simplex Method of solving linear programs.

1 1.1

The Dual Physical Justiﬁcation of the Dual

Consider the standard dual form of a linear program. The set of feasible solutions y that satisfy the constraints AT y ≤ c form a polyhedron in Rn ; this is the intersection of m halfspaces. Consider a tiny ball within this polyhedron at position y. To maximize bT y, we move the ball as far as possible in the direction of b within the conﬁnes of our polyhedron. This is analogous to having a force, say gravity, acting on the ball in the b direction. We now switch over entirely to the physical analogy. At equilibrium, the ball ends up at a point y maximizing bT y over AT y ≤ c, and the gravity force b is in equilibrium with the forces exerted against the ball by the ’walls’ of our polyhedron. These wall forces are normal to the hyperplanes deﬁning them, so for the hyperplane deﬁned by aTj y ≤ c (where aj is the jth column of A), the force exerted on the ball can be expressed as −xj aj for some magnitude multiplier xj ≥ 0. As stated previously, our ball is at equilibrium (there is no net force on it), and so we ﬁnd � b− xj aj = 0. j

We also note that for any wall which our ball is not touching, there is no force exerted by that wall on the ball. This is equivalent to saying xj = 0 if aTj y < cj . We now argue that these multipliers xj form an optimum solution to the primal linear program. We ﬁrst note that � b− xj aj = 0 j

is equivalent to Ax = b, and that the multipliers xj are either zero or positive, and thus x ≥ 0. This shows that our xj ’s yield a feasible solution to the primal, now we need to prove that the xj ’s 10-1

Figure 1: Physical visualization of the dual with n = 2 (two dimensions), m = 6 (six hyperplanes), and b as gravity. The dual is maximized when our bT y ball is at the lowest point of the polyhedron. minimize the primal. For this, we will show that the value cT x equals bT y, and therefore by weak duality, this will mean that x is a minimizer for the primal. The value cT x is: � � cT x = cj xj = (aTj y)xj , j

j

since xj is non-zero only where aTj y = cj (a non-zero force is only exerted by a wall on our ball if the ball is touching that wall), and thus � � cT x = (aTj y)xj = y T ( aj xj ) = y T b = bT y. j

1.2

j

Rules for Writing a Dual

So far, we have dealt only with the dual of the standard primal linear programming problem, minimizing cT x such that Ax = b and x ≥ 0. What if we are confronted with a non-standard linear program, such as a program that involves inequalities on the aij xj , or non-positivity constraints on the xj ? We have two options. The ﬁrst is to massage the linear program into the standard primal form, immediately convert to the standard dual, and then potentially massage the dual problem into a form more suitable to our original problem. This can be a long, frustrating process, however, and so instead we present a set of standard rules for converting any linear � program into its dual form. Consider a linear problem with the objective of minimizing j cj xj subject to the following constraints: ⎧ ⎪ ⎨= bi i ∈ I=

� aij xj ≥ bi i ∈ I≥ (1)

⎪ ⎩ j ≤ bi i ∈ I≤ ⎧ ⎪ ⎨≥ 0 j ∈ J+

xj ≤ 0 j ∈ J− (2)

⎪ ⎩ ∈ R j ∈ J0 . Earlier, the way we obtained the dual was to get a lower bound (or an upper bound if it was a maximization problem) on the objective function of the primal, and � to maximize this upper bound. We claim that the same process leads to the dual of maximizing i bi yi subject to the constraints: 10-2

⎧ ⎪ ⎨≤ cj j ∈ J+

� aij yi ≥ cj j ∈ J− ⎪ ⎩ i = cj j ∈ J0 ⎧ ⎪ ⎨≥ 0 i ∈ I≥ yi ≤ 0 i ∈ I≤ ⎪ ⎩ ∈ R i ∈ I=

(3)

(4)

Weak duality is pretty straightforward. Constraints (4) on yi guarantee that, when multiplying constraint (1) by yi and summing them over i, we get � � � yi aij xj ≥ y i bi . (5) i

j

i

Similarly, constraints (3) together with constraints (2) imply that � � � cj xj ≥ xj aij yi . j

j

(6)

i

The left-hand-side of (5) being equal to the right-hand-side of (6) (after rearranging the summation), we get weak duality that cT x ≥ bT y. And strong duality also holds provided that either the primal or the dual has a feasible solution.

1.3

Complementary Slackness

Complementary slackness allows to easily check when a feasible primal and dual solutions are simul taneously optimal. Consider the primal min{cT x : Ax = b, x ≥ 0}. Consider an alternative deﬁnition of the dual LP obtained by adding slack variables: max{bT y : AT y + Is = c, s ≥ 0}, where s ∈ Rn . Given a feasible primal solution x and a feasible dual solution (y, s), we see that the diﬀerence in their value is cT x − bT y = sT x + y T Ax − y T b = sT x, and this quantity better be 0 if x is optimum for the primal and (y, s) is optimal for the dual. Notice that x ≥ 0 and s ≥ 0, and therefore xT s = 0 if and only if xj sj = 0 for all j. Thus, for the 2 solutions to be simultaneously optimum in the primal and in the dual, we need that, for all j, xj = 0 whenever sj > 0 (or equivalently that sj = 0 whenever xj > 0). Summarizing, we have: Theorem 1 Let x∗ be feasible in the primal, and (y ∗ , s∗ ) be feasible in the dual. Then the following are equivalent. 1. x∗ is optimal in the primal, and (y ∗ , s∗ ) is optimal in the dual, 2. For all j: x∗j > 0 =⇒ s∗j = 0, 10-3

3. For all j: x∗j s∗j = 0, � 4. x∗j s∗j = 0. j

For a general pair of primal-dual linear programs as given in (1)-(2) and (3)-(4), complementary slackness says that, for x to be optimal in the primal and for y to be optimal in the dual, we must have that � 1. yi = 0 whenever j aij xj = � bi and, � 2. xj = 0 whenever i aij yi = � cj .

2

The Geometry of Linear Programming

We now switch gears and discuss the geometry of linear programming. First, we deﬁne a polyhedral set P = {x ∈ Rn : Ax ≤ b} as the ﬁnite intersection of halfspaces. We then deﬁne a vertex of polyhedral set P to be any x ∈ P such that x + y ∈ P ∧ x − y ∈ P =⇒ y = 0. Intuitively, a vertex is a “corner” of a polyhedral set. We can state this geometric deﬁnition also algebraically. Given an index set J ⊆ {1, 2, · · · , n}, AJ denotes the m × |J| submatrix of A consisting of all columns of A indexed by J. Lemma 2 For P = {x : Ax = b, x ≥ 0} and x ∈ P , x is a vertex of P if and only if AJ has linearly independent colums for J = {j : xj > 0}. Proof: For both directions, we prove the contrapositive. ⇐: Assuming x is not a vertex implies that ∃y = � 0 : x + y, x − y ∈ P . Therefore A(x + y) = b, A(x − y) = b, which implies that Ay = 0. However, because membership in P requires points to be non-negative, we have that if xj = 0 then yj = 0. Thus, if we let w = yJ (i.e. w corresponds to the components of y in J), we see that w = � 0 and AJ w = 0, which implies that AJ has linearly dependent columns. ⇒: If AJ has linearly dependent columns, then ∃w �= 0 : AJ w = 0. This implies you can construct a y via zero padding such that y �= 0 and Ay = 0, yj = 0 for j �∈ J. Thus, A(x + �y) = A(x − �y) = b x for any � ∈ R. We also note that xj ± �yj ≥ 0 if � ≤ |yjj | , which is strictly greater than 0. Therefore, xj if we choose � = min , we have that x ± �y ∈ P , and thus x is a not a vertex of P . � j:yj =0

� |yj |

We can take the notions in this lemma a step further by introducing the notions of a basis, a basic solution, and a basic feasible solution. For what follows, we assume that rank(A) = m (if that’s not the case, then either there is no solution to Ax = b and our problem is infeasible, or there exists a redundant constraint (possibly more than one) in Ax = b which can be removed). Deﬁnition 1 For a polyhedral set P = {x : Ax = b, x ≥ 0}, a basis B is a subset of {1...n} such that |B| = m and AB is invertible (i.e. rank(AB ) = m). Deﬁnition 2 x is a basic solution of P if ∃ basis B : xB = A−1 B b, xN = 0 for N = {1...n} \ B. Note that by this deﬁnition, AB xB + AN xN = b must be true, but x could be negative and therefore infeasible. Deﬁnition 3 x is a basic feasible solution (bfs) if it is a basic solution such that x ≥ 0. We are now ready to prove the following theorem relating vertices to basic feasible solutions.

10-4

Theorem 3 Given a polyhedral set P = {x : Ax = b, x ≥ 0} such that rank(A) = m, and a point x ∈ P , x is a vertex of P if and only if it is a basic feasible solution of P . Proof: Will be provided in Lecture 11. There are several notable remarks to make pertaining to this theorem:

�

• The vertex to basic feasible solution relationship is one-to-many, or in other words, there may be multiple basic feasible solutions that correspond to a single vertex. • The number of vertices of P is less than or equal to the number of bases of P . This follows from the ﬁrst remark, and the fact that� some bases may be infeasible. Therefore, the number of � n vertices of P is upper bounded by m . However, a stricter upper bound has been shown using a more analysis, namely the number of vertices of P is upper bounded approximately � mdetailed � by n−m 2 . 2

We now know that ﬁnding basic feasible solutions of P is equivalent to ﬁnding vertices of P . Why is this important? Because there must an optimum solution to our linear programming problem that is a vertex of the polyhedral set deﬁned by the linear constraints. More formally, Theorem 4 Given a polyhedral set P = {x : Ax = b, x ≥ 0}, if min{cT x : x ∈ P } is ﬁnite (the program is feasible and bounded), and x ∈ P , then ∃ vertex x� of P : cT x� ≤ cT x. Proof:

Will be provided in Lecture 11.

�

This theorem directly leads us to the insight behind the Simplex Method for solving linear programs by ﬁnding the best vertex.

3

Sketch of the Simplex Method

Here is a very basic sketch of how the simplex method works. 1. Choose a basic feasible solution x corresponding to the basis B. 2. While x is not an optimal solution, choose j and k such that the new basis B � = B \ {j} ∪ {k} forms a bfs x� with cT x ≤ cT x. There are several important remarks to make about this method: • It is not clear that j and k will always exist. But they do, and this can be shown. • As deﬁned, x and x� will either be equal or will be ’adjacent’ vertices on P . • The reason it is called a ’method’ and not an algorithm is because we haven’t speciﬁed yet how to choose j and k if several choices exist. The choice of j and k is referred to as a pivoting rule; many pivoting rules have been proposed. • As such, there is no guarantee that cT x� < cT x, namely we could have cT x� = cT x; in fact we could even have x� = x since we could switch from one basis to another representing the same vertex. There is therefore the risk that we repeat the same basis and the algorithm never terminates. And this can happen for some of the pivoting rules. There exist however anticycling pivoting rules which guarantee that the same basis is never repeated. With such a rule, the simplex method will terminate since there are ﬁnitely many bases. • The running time of the simplex method depends on the number of bases considered before ﬁnding an optimal one. 10-5

• For all currently known pivoting rules, there is at least one instance that will cause the simplex method to run in exponential time. (This is in contrast with the simplex method in practice for which the number of iterations is usually good. A partial explanation of this sharp contrast between the worst-case behavior and a typical behavior is highlighted in the work of Spielman and Teng on smoothed analysis.) We will cover other algorithms that will guarantee a polynomial running time in the worst-case; they will however not proceed from vertex to vertex of the polyhedral set. There is a lower bound on the number of iterations of the Simplex Method, which is the number of edges in the path from the starting vertex of P to the optimum vertex of P . For a given P , this lower bound will be the diameter of P , the maximum over all pairs of vertices of the length of the shortest path between them. In 1957, Hirsch conjectured that the diameter of a polyhedral set is upper bounded by n − d, where d is the dimension of the space, and n is the number of hyperplanes deﬁning P . While this has not been proven true in the general case, the following results have been found: • The conjecture is not true in the unbounded case, namely there exist unbounded polyhedra with diameter n − d + � d5 �. • No polynomial bound on the diameter is known for the general case (even for just bounded polyhedra). • Kalai and Kleitman derived a subexponential bound nO(log d) on the diameter. • If the Hirsch Conjecture can be proven for n = 2d, then the conjecture holds for all n. • The Hirsch Conjecture is true for polytopes with all their vertces in {0, 1}d .

10-6

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

October 15, 2008

Lecture 11 Lecturer: Michel X. Goemans In this lecture, we will start continuing from where we left in the last lecture on linear programming. We then argue that LP ∈ N P ∩ co − N P . In the end of this lecture, we introduce the ﬁrst polynomial algorithm to solve LP , known as the Ellipsoid Algorithm.

1

LP continuation

Last time, we had proved that, given a polyhedral set P = {x : Ax = b, x ≥ 0}, a point x is a vertex of P if and only if A{j: xj >0} has linearly independent columns. Now assume that rank(A) = m, where m is the number of rows. We had then deﬁned the notion of a basic feasible solution (bfs) corresponding to a basis B, see last lecture for details. Theorem 1 Consider the polyhedral set P = {x : Ax = b, x ≥ 0} where rank(A) = m. A point x is a vertex of P if and only if it is a basic feasible solution. Proof: If x is a vertex of P , then we know that A{j :xj >0} has linearly independent columns. Let J == {j : xj > 0}. Thus rank(AJ ) = |J|. Since rank(A) = m, we can add columns to J to get a set B with |B| = m and rank(AB ) = m, i.e. AB is invertible. We must have that: xB = A−1 B b xN = 0. Therefore, x is a basic feasible solution. Conversely, assume x is a basic feasible solution, that is, xB = A−1 B b xN = 0. By deﬁnition, J = {j : xj > 0} ⊆ B and the fact that rank(AB ) = |B| implies that AJ has linearly independent columns. Thus, x is a vertex of P . � Theorem 2 Let P = {x : Ax = b, x ≥ 0}. Assume min{cT x : x ∈ P } is ﬁnite. Then, for any x ∈ P , there exists a vertex x′ ∈ P such that cT x′ ≤ cT x Proof: If x is a vertex, we are done. Otherwise, there exists y 6= 0 such that x ± y is in P. Note that, as Ay = 0 (because A(x + y) = b = Ax), for any α ∈ R, A(x + αy) = b. Observe that, � x for α ≤ −yjj , if yj < 0 (x + αy)j ≥ 0 always, if yj ≥ 0 11-1

We may assume that cT y ≤ 0 (otherwise choose −y). Moreover, if cT y = 0, we can assume that there exists j such that yj < 0. Assume, by contradiction, that for all j, yj ≥ 0. Then, cT y < 0. But this implies that cT (x + αy) → −∞ as α → ∞ Then min{cT x : x ∈ P } is not ﬁnite. Contradiction! Therefore, there exists j such that yj < 0. Choose α = min

j: yj λ (or show that {x ≥ 0 : Ax = b} is empty using Farkas’ lemma). In the case when {x : Ax = b, x ≥ 0} is feasible, the correctness follows from strong duality saying that min{cT x : Ax = b, x ≥ 0} = max{bT y : AT y ≤ c}. Thus, LP ∈ N P ∩ co − N P which makes it likely to be in P. And indeed, LP was shown to be polynomially solvable through the ellipsoid algorithm.

11-4

P

a1

a0

E0 E1

Figure 1: One iteration of the ellipsoid algorithm.

4

The Ellipsoid Algorithm

The Ellipsoid algorithm was proposed by the Russian mathematician Shor in 1977 for general convex optimization problems, and applied to linear programming by Khachyan in 1979. The problem being considered by the ellipsoid algorithm is: Given a bounded, convex, non-empty and full-dimensional set P ∈ Rn ﬁnd x ∈ P. We will see that we can reduce linear programming to an instance of this problem. The ellipsoid algorithm works as follows. We start with a big ellipsoid E that is guar anteed to contain P . We then check if the center of the ellipsoid is in P . If it is, we are done, we found a point in P . Otherwise, we ﬁnd an hyperplane passing through the center of the ellipsoid, so that P is contained in one of the half spaces deﬁned by it. One iteration of the ellipsoid algorithm is illustrated in Figure 1. The ellipsoid algorithm is the following. • Let E0 be an ellipsoid containing P • while center ak of Ek is not in P do: – Let cTk x ≤ cTk ak be such that {x : cTk x ≤ cTk ak } ⊇ P – Let Ek+1 be the minimum volume ellipsoid containing Ek ∩ {x : cTk x ≤ cTk ak } – k ←k+1 The ellipsoid algorithm has the important property that the ellipsoids constructed shrink by, at least, a constant (depending on the dimension) factor in volume as the algorithm proceeds; this is stated precisely in the next lemma. As P is full dimensional, we will eventually ﬁnd a point in P . Lemma 8

V ol(Ek+1 ) V ol(Ek )

1

< e− 2n+2 . 11-5

Note that the ratio is independent of k. Before we can state the algorithm more precisely, we need to deﬁne ellipsoids. Deﬁnition 1 Given a center a, and a positive deﬁnite matrix A, the ellipsoid E(a, A) is deﬁned as {x ∈ Rn : (x − a)T A−1 (x − a) ≤ 1}. One important fact about a positive deﬁnite matrix A is that there exists B such that A = B T B, and hence A−1 = B −1 (B −1 )T . Ellipsoids are in fact just aﬃne transformations of unit balls. To see this, consider the (bijective) aﬃne transformation T : x → y = (B −1 )T (x − a). It maps E(a, A) → {y : y T y ≤ 1} = E(0, I), the unit ball. V ol(E ) is independent of k. Indeed, This gives a motivation for the fact that the ratio V ol(Ek+1 k) as linear transformations preserve ratio of volumes, we can reduce to the case when Ek is the unit ball. In this case, by symmetry of the ball, the volume ratio will be independent of k.

11-6

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

October 20, 2008

Lecture 12 - Ellipsoid algorithm Lecturer: Michel X. Goemans In this lecture we describe the ellipsoid algorithm and show how it can be applied to the linear programming problem.

1

Ellipsoid algorithm

1.1

Deﬁnitions

An ellipsoid is denoted by E(a, A) = {x ∈ Rn : (x − a)T A−1 (x − a) ≤ 1}, with center a ∈ Rn and A ∈ Rn×m that is positive deﬁnite. Recall that A is symmetric if A = AT . A matrix is positive deﬁnite if it is symmetric and ∀x 6= 0, we have xT Ax > 0. The inverse of a positive deﬁnite matrix is also positive deﬁnite. Symmetric matrices have only real eigenvalues, and positive deﬁnite matrices have only real positive eigenvalues.

1.2

Problem statement

Given P ⊆ Rn bounded, closed, convex, ﬁnd x ∈ P or show that P = ∅. 1.2.1

Assumption: Separation oracle

The ﬁrst issue is how the convex set P is given. We assume that we have a “separation oracle” for P which does the following. Given a, the oracle either 1. aﬃrms that a ∈ P , or 2. outputs c ∈ Rn such that P ⊆ {x ∈ Rn : cT x < cT a}. Think of c as the normal vector of the plane separating a and P , pointing away from P . Such a hyperplane exists because P is convex and closed. An algorithm for our problem would be judged based on how many times it queries the oracle. We would like the number of queries to be polynomial in terms of the input data. 1.2.2

Assumption: Outer ball and minimum volume

As such, the problem is hopeless, since we do not know where to search for a point x ∈ P , and P may even contain just a single point x. So we make two further assumptions. They are • P ⊆ “big ball”, i.e. P ⊆ B(0, R), a ball with center 0 and radius R > 0. This tell us where out search can be conﬁned. • If P 6= ∅, P has “suﬃcient” volume. Let’s say we are given r > 0 such that we are guaranteed that P contains some ball of radius r if P is non-empty. We consider the size of our input to be n + log R − log r.

12 - Ellipsoid algorithm-1

1.3

Sketch of the algorithm

Here is an outline of the ellipsoid algorithm: • Start with ellipsoid E0 = (a0 , A0 ). • Maintain an ellipsoid Ek = (ak , Ak ) ⊇ P . At iteration k, ask the oracle if ak belongs to P . – If answer is yes, then we are done. – If ak does not belong to P , then the oracle provides a ck such that P ⊆ {x ∈ Rn : cT x < cTk ak }. Thus, the separating hyperplane slices Ek and P is on one side of this hyperplane. We then determine a smaller ellipsoid Ek+1 such that Ek+1 ⊇ Ek ∩ {x : cTk x < cTk ak }.

(1)

– (Refer to Fig. (1)). – Notice that Ek ⊇ P and we iterate on. If we can show that volume of Ek+1 decays exponentially, then in “few” iterations, we either ﬁnd a point in P , or reach Vol(Ek+1 ) < Vol(B(0, r)) and conclude that P = ∅.

Ek

ck

ak

P

Ek+1

Figure 1: Diagram illustrating a single iteration of the ellipsoid algorithm.

1.4

Bounding volume of ellipsoids

Proposition 1 Given Ek = E(ak , Ak ) and ck , we can ﬁnd Ek+1 such that Eq. (1) is satisﬁed and � � 1 Vol(Ek+1 ) < exp − . Vol(Ek ) 2(n + 1) Let us ﬁrst focus on the simple case in which our ellipsoid is the unit ball centered at the origin. Claim 2 Proposition 1 holds for the special case where Ek = E(0, I) and ck = −e1 . 12 - Ellipsoid algorithm-2

Proof: By symmetry, Ek+1 is an axis-aligned ellipsoid with center along the x1 axis. It has to contain all points with x1 = 0. See Fig. (2). Formally, we want Ek+1 ⊇ Ek ∩ {x : x1 ≥ 0}, and one can show that it is enough to guarantee that (i) e1 ∈ Ek+1 and (ii) for all x with kxk = 1 and x1 = 0, we have x ∈ Ek+1 .

E

E =E(0,I)

k+1

k

P ck=−e1

0

Figure 2: Diagram illustrating the case where Ek = E(0, I). We propose the following � Ek+1

�

�2 � x1 −

�2

n

n2 − 1 � 2 + x ≤1 = x: n2 i=2 i � � �� 1 n2 2 =E . e1 , 2 I− e1 eT1 n+1 n −1 n+1 n+1 n

1 n+1

�

It is easy to verify that this ellipsoid satisﬁes the constraints above. Since the volume of an ellipsoid is proportional to the product of its axis lengths, we obtain: n−1 Vol(Ek+1 ) n n2 = ·( 2 ) 2 Vol(Ek ) n+1 n −1 � � � � 1 n−1 1 exp < exp − n+1 n2 − 1 2 � � 1 = exp − , 2(n + 1)

where we have used the fact that 1 + x < ex whenever x = 6 0 (for x = 0 we have equality). Next, we do a slightly more general case. Claim 3 Proposition 1 holds when Ek = E(0, I), ck = d and kdk = 1. Proof:

From the previous simple case, it is clear that the following Ek+1 works. � � �� 1 n2 2 T Ek+1 = E − . d, I− dd n + 1 n2 − 1 n+1 12 - Ellipsoid algorithm-3

�

� Proof of Proposition 1: In general, we can transform E(ak , Ak ) to E(0, I) and map ck into some d. We can then ﬁnd an ellipsoid E ′ as in the proof of Claims 2 and 3, and map it back to obtain Ek+1 . Denote the linear transformation that maps E(ak , Ak ) into E(0, I) as T . Here is a picture: Ek

→

T

E(0, 1) ↓

Ek+1

T −1

E′

←

Recall that we have E(a, A) = {x : (x − a)T A−1 (x − a) ≤ 1}. By Cholesky decomposition (since A is positive deﬁnite), we can write A = B T B for some matrix B. If we let y = (B −1 )T (x − 1), then we have (x − a)T B −1 (B −1 )T (x − a) ≤ 1 (⇔)

y T y ≤ 1,

so we have a unit ball in the y space. Thus, our linear transformation T and its inverse are: T (x) = y = (B −1 )T (x − ak ), T −1 (y) = ak + B T y. We need an equivalent “half-space” constraint after applying T . From Eq. (1), cTk x < ckT ak cTk (B T y + ak ) < cTk ak cTk B T y < 0. Hence, in the new space, the unit normal vector of the separating plane is Bck d= � . cTk B T Bck

From Claim 3, we can ﬁnd an ellipsoid E ′ in the y space. For convenience (and aesthetic pleasure), let b = B T d. Apply T −1 to E ′ to obtain Ek+1 = E(ak+1 , Ak+1 ) 1 1 ak+1 = ak − B T d = ak − b n+1 n+1 � � �� � � n2 2 n2 2 T T T Ak+1 = B I− B= 2 Ak − . dd bb n2 − 1 n+1 n −1 n+1 Since aﬃne transformations preserve the ratios between volumes, we immediately have the desired bound. Here are the details. Vol(E(0, I)) = det((B −1 )T )Vol(Ek ) Vol(Ek+1 ) = det(B T )Vol(E ′ ). Rearranging, we have

�

� V ol(Ek+1 ) V ol(E ′ ) 1 = < exp − . V ol(Ek ) V ol(E(0, I)) 2(n + 1) � 12 - Ellipsoid algorithm-4

1.5

Running time

� � From Proposition 1, we know that Vol(Ek ) < Vol(E0 ) exp − 2(nk+1) . If P is nonempty, then the ellipsoid algorithm terminates in � � Vol(E0 ) # iterations = O n log . Vol(P ) � R �n (E0 ) By our assumption on P containing a ball of radius r if non-empty, we have that Vol Vol(P ) ≤ r , and thus the number of iterations is � � # iterations = O n2 (log R − log r) .

If P is empty, by the same number of iterations, we are guaranteed of its emptyness. We conclude this section by noting a small subtlety. To compute d, we have to be able to ﬁnd B such that A = B T B. Cholesky decomposition takes O(n3 ) and guarantees that numbers in B have size polynomially bounded by the size of numbers in A. But we have to take square roots (in the calculation of d), so we might have to deal with irrational numbers. As a result, we may have to do some rounding to make �Ek+1 slightly bigger. We have to argue that the volume decrease factor is � 1 still reasonable, say exp − 3(n+1) , but this detail shall be omitted.

2

Applying ellipsoid algorithm to linear programming

2.1

Linear programming problem

In the linear programming problem, we are asked to ﬁnd min{cT x : Ax = b, x ≥ 0} with inputs A, b, c. The size of the input, from last lecture, is L = m + n + log detmax + log bmax + log cmax . To apply the ellipsoid algorithm, we will need to 1. Go from an optimization problem to a feasibility problem. 2. Show that the initial convex set is bounded and argue about how big the bounding ellipsoid has to be. Argue about termination and provide an inner ball if P is nonempty. i.e. we want P to be full-dimensional.

2.2

Optimization to feasibility

We will convert the optimization problem to a feasibility problem as follows: 1. Check feasibility of Ax = b, x ≥ 0. 2. If answer is infeasible, we are done because LP is infeasible. 3. Otherwise, check feasibility of dual. Dual is max{bT y : AT y ≤ c}. Check for feasibility of AT y ≤ c. • If dual is not feasible, we are done because LP is unbounded. • Otherwise, both primal and dual are feasible. Their solutions have to match by strong duality. Hence, we check for feasibility of Ax = b, x ≥ 0, AT y ≤ c, cT x = bT y to ﬁnd a solution for both primal and dual. 12 - Ellipsoid algorithm-5

2.3

Outer and inner cubes

Here we describe how to go from a system of linear inequalities to an equivalent one (in terms of feasibility) which if non-empty is full-diemnsional and has enough volume. Proposition 4 Let P := {x : Ax ≤ b} and e be the vector of all ones. Assume that A has full column rank n1 . Then P is nonempty iﬀ P ′ = {x : Ax ≤ b + 21L e, −2L ≤ xj ≤ 2L for all j} is nonempty. �

This proposition allows us to choose E0 to be a ball centered at the origin containing the cube �n −2L, 2L . Also, if there exists a x ˆ such that Ax ˆ ≤ b then � � � � 1 1 1 A x ˆ ± 2L ≤ b + namax e ≤ L e where amax is max entry of A. 2L 2 2 2

That gives us a little cube around ˆ � The �time for ﬁnding an x in P ′ is thus O(n · nL), because � L x. � 1 1 n L n is 8Ln . Recall that ﬁnding x in P takes the ratio of the volumes of −2 , 2 to − 22L , 22L V ol(E0 ) O(n log V ol(P ) ) iterations. That means LP takes polynomial time in L. Proof of Proposition 4: We ﬁrst prove the forward direction. Suppose P = 6 ∅. Our only worry is whether there is any element in P inside the big box. This has been done in previous lecture. We consider a vertex x in P (this exists because A has full column rank). This implies that x is deﬁned by AS x = bS , where AS is a submatrix of A. Using Cramer’s rule, we can write x as � � p1 p2 pn x= , ,··· , q q q with |pi | < 2L and 1 ≤ q < 2L . We now work on the converse. {x : Ax ≤ b} = ∅ implies, by Farkas’ Lemma, there exists a y such that y ≥ 0, AT y = 0, and bT y = −1. We can choose a vertex of AT y = 0, bT y = −1, y ≥ 0. Rewrite this as � T � � � A 0 y= , y ≥ 0. bT −1

By Cramer’s rule, we can bound the components of a basic feasible solution y as: �r rm � 1 yT = ,··· , , s s � T � A with 0 ≤ s, ri ≤ detmax . Expanding the determinant along the last row, we see that bT � T � A ≤ mbmax detmax (A). Using the fact that 2L > 2m 2n detmax (A)bmax , we obtain detmax bT m 0 ≤ s, ri < 2mm2n 2L ≤ 2m+1 2L . Therefore, �T � m2 1 1 y = bT y + L eT y = −1 + m+1 < 0. b + Le ���� 2 2 2 −1

(The last inequality holds for m ≥ 1.) By Farkas’ Lemma again, this y shows that there is no x satisfying Ax ≤ b + 21L e, i.e. P ′ is empty. �

1 Small detour: We have previously dealt with the constraint problem Ax = b, x ≥ 0. If this is non-empty, then we have a vertex in the feasible solution. However, there is not guaranteed if the constraints are of the form Ax ≤ b. But if we have rank(A) = n, A ∈ Rm×n , then a non-empty P will always contain a vertex. In our case, since we convert from the problem with constraints x ≥ 0, we would have inequalities −Ix ≤ 0 and full column rank.

12 - Ellipsoid algorithm-6

2.4

Obtaining a solution

There is one last problem. If the ellipsoid method returns a x in P ′ , x might not be in P . One solution is to round the coeﬃcients of the inequalities to rational numbers and “repair” these inequalities to make x ﬁt in P . This is called simultaneous Diophantine approximations, and will not be discussed. We can solve this problem by another method. We give a general method for ﬁnding a feasible solution of a linear program, assuming that we have a procedure that checks whether or not the linear program is feasible, e.g. ellipsoid algorithm. Assume, we want to ﬁnd a solution of Ax ≤ b. The inequalities in this linear program can be written as aTi x ≤ bi for i = 1, · · · , m. We use the following algorithm: 1. I ← ∅. 2. For i ← 1 to m do • If the set of solutions of �

aTj x ≤ bj aTj x = bj

∀j = i + 1, · · · , m ∀j ∈ I ∪ {i}

�

is nonempty, then I ← I ∪ {i}. 3. Finally, solve x in aTi x = bi for i ∈ I with Gaussian elimination. We assume that the solution is a vertex and satisﬁes some equalities. If at step 2, making in equality i an equality makes the problem infeasible, then the vertex cannot depend on this inequality and we can discard it.

12 - Ellipsoid algorithm-7

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

October 27, 2008

Lecture 14 Lecturer: Michel X. Goemans

1

Introduction

For this lecture we’ll look at using interior point algorithms for solving linear programs, and more generally convex programs . Developing originally in 1984 by Narendra Karmarkar, there have been many variants (with some of the keywords ’path following’, ’primal-dual’, ’potential reduction’, etc.) on interior point algorithms, especially through the late 80s and early 90s. In the late 90s, people began to realize that interior point algorithms could also be used to solve semideﬁnite programs (or, even more generally, convex programs). As much as possible, we will discuss linear programming, semideﬁnite programming, and even a larger class called conic programming in a uniﬁed way.

2

Linear Programming

We will start with linear programming. Remember that in linear programming, we have: Primal: Given A ∈ Rm×n , c ∈ Rn and b ∈ Rm , ﬁnd x ∈ Rn : cT x Ax = b, x ≥ 0.

Min s.t.

Its dual linear program is: Dual: Find y ∈ Rm : Max s.t.

bT y AT y ≤ c.

We can introduce non-negative slack variables and rewrite this as: Dual: Find y ∈ Rm , s ∈ Rn : Max s.t.

bT y AT y + s = c, s ≥ 0.

We know that, for a feasible solution, x in the primal, and a feasible solution (y, s) in the dual, we know by complementary slackness that they will both be optimal (for the primal and the dual resp.) iﬀ xT s = 0. Since this is the component-wise product of two non-negative vectors, we can equivalently say: xj sj = 0

2.1

∀j.

Using the Interior Point Algorithm

The interior point algorithm will iteratively maintain a strictly feasible solution in the primal, such that for all values of j, xj > 0. Similarly in the dual, it will maintain a y and an s such that for all values of j, sj > 0. Because of this strict inequality, we can never reach our optimality 14-1

condition stated above; however, we’ll get very close, and once we do, we can show that a jump from this non-optimal solution (for either the primal or the dual) to a vertex of improved cost (of the corresponding program) will provide an optimal solution to the (primal or dual) program. In some linear programs, it may not be possible to start with a strictly positive solution. For example, for any feasible solution to the program, it may be that xj = 0, so we may be unable to ﬁnd a strictly feasible solution with which to start the algorithm. This can be dealt with easily, but we will not discuss this. We’ll assume that the primal and dual both have strictly feasible solutions.

3

Semideﬁnite Programming

As introduced in the previous lecture, in semideﬁnite programming, our variables are the entries of a symmetric postitive semideﬁnite matrix X. Let S n denote the set of all real, symmetric and n × n matrices. For two such matrices A and B, we deﬁne an inner product �� A•B = Aij Bij = T race(AT B) = T race(AB). i

j

Semideﬁnite programming (as a minimization problem) is Min s.t.

C •X Ai • X = bi

i = 1...m

X � 0. Remember that for a symmetric matrix M , M � 0 means that M is positive semideﬁnite, meaning that all of its (real) eigenvalues λ ≥ 0, or equivalently, ∀x, xT M x ≥ 0.

3.1

Dual for SDP

When working with linear programs, we know the existence of a dual linear program with a strong property: Any feasible dual solution provides a lower bound on the optimum primal value and, if either program is feasible, the optimum primal and optimum dual values are equal. Does a similar dual for a semideﬁnite progrm exist? The answer if yes, although we will need some additional condition. We claim that the dual takes the following form. Dual: Find yi ∈ Rn , and S ∈ S n : Maxy∈Rm s.t.

bT y �

y i Ai + S = C

i

S � 0.

14-2

3.1.1

Weak Duality

For weak duality, consider any feasible solution x in the primal, and any feasible solution (y, S) in the dual. We have: � � � C •X = y i Ai + S • X i

=

�

yi (Ai • X) + S • X

i

=

�

y i bi + S • X

i T

= b y+S•X ≥ bT y, the last inequality following from Lemma 1 below. This is true for any primal and dual feasible solutions, and therefore we have z ≥ w, where: z

=

min{C • X : X feasible for primal},

w

=

max{bT y : (y, S) feasible for dual}.

Lemma 1 For any A, B � 0, we have A • B ≥ 0. Proof of Lemma 1: Any positive semideﬁnite matrix A admits a Cholesvky decomposition: A = V T V for some n × n matrix V . Thus, A • B = T race(AB) = T race(V T V B) = T race(V BV T ), the last inequality following from the fact that, for (not necessarily symmetric) square matrices C and D, we have T race(CD) = T race(DC). But V BV T is positive deﬁnite (since xT V BV T x ≥ 0 for all x), and thus its trace is nonnegative, proving the result. � A similar lemma was used when we were talking about linear programming, namely that if a, b ∈ Rn with a, b ≥ 0 then aT b ≥ 0. 3.1.2

Strong Duality

In general, it’s not true that z = w. Several things can go wrong. In deﬁning z, we wrote: z = min C • X. However, that min is not really a min, but rather an inﬁmum. It might happen that the inﬁmum value can be approached arbitrarily closely but no solution may attain that value precisely. Similarly in the dual, the supremum may not be attained. In addition, in semideﬁnite programming, it is possible that the primal may have a ﬁnite value, but that the dual may be infeasible. In linear programming, this was not the case. If the primal had a ﬁnite feasible value and was bounded, the dual was also ﬁnite and with the same value. In semideﬁnite programming, the primal can be ﬁnite, while the dual may be infeasible or vice versa. In addition, both the primal and dual could be ﬁnite, but they could be of diﬀering values. That all said, in the typical case, you do have strong duality (z = w), but only necessarily under certain conditions. 3.1.3

Introducing a Regularity Condition

Assume that the primal and dual have a strictly feasible solution. This means that for the primal: ∃X

s.t. Ai • X = bi X � 0. 14-3

i = (1...m).

’A � 0’ denotes that A is a positive-deﬁnite matrix, meaning that ∀a = � 0, aT Xa > 0, or equivalently that all its eigenvalues λi satisfy λi > 0. Likewise, in the dual, there exists y and S such that: � i y i Ai + S = C S � 0. If we assume this ’regularity condition’ that we’ve deﬁned above, then the primal value z is ﬁnite and attainable (i.e. it is not an inﬁmum, but actually a minimum), and the dual value w is attained and furthermore z = w. This is given without proof.

4

Conic Programming

Conic Programming is a generalization of both Linear Programming and Semideﬁnite Programming. First, we need the deﬁnition of a cone: Deﬁnition 1 A cone is a subset C of Rn that has the property that for any v ∈ C and λ ∈ R+ , λv is also in C. Conic Programming is constrained optimization over K, a closed convex cone, with a given inner product �x, y�. We can, for example, take K = Rn and �x, y� = xT y for any x, y ∈ Rn ; this will lead to linear programming. Conic programming, like LP and SDP, has both a primal and a dual form; the primal is: Primal: Given A ∈ Rm×n , b ∈ Rm , and c ∈ Rn : min s.t.

�c, x� Ax = b x ∈ K.

More generally, we could view K as a cone in any space, and then A is a linear operator from K to Rm . To form the dual of a conic program, we ﬁrst need to ﬁnd the polar cone, K ∗ , of K. The polar cone is deﬁned to be the set of all s such that for all x in K, �s, x� ≥ 0. For instance, the polar n cone of Rn+ is R+ itself (indeed if sj < 0 then we have s ∈ / K ∗ since �ej , s� < 0; conversely, if s ≥ 0 ∗ then �x, s� ≥ 0). In the case that K = K , we say that K is self-polar. Similarly, the polar cone of P SD, the set of positive semideﬁnite matrices, is also itself. We also deﬁne the adjoint (operator) A∗ of A to be such that, for all x and y, �A∗ y, x� = �y, Ax�. For example, if the inner product is a standard dot product and A is the matrix corresponding to a linear transformation from Rn to Rm , then A∗ = AT . To write the conic dual, we introduce a variable y ∈ Rm and s ∈ Rn and optimize: Dual: max s.t.

�b, y� A∗ y + s = c s ∈ K ∗.

4.0.4 Weak Duality We can prove weak duality – that the value of the primal is at least the value of the dual – as follows. Let x be any primal feasible solution and (y, s) be any dual feasible solution. Then �c, x� = �A∗ y + s, x� = �A∗ y, x� + �s, x� = �y, Ax� + �s, x� = �b, y� + �s, x� ≥ �b, y�, where we have used the deﬁnition of K ∗ to show that �s, x� ≥ 0. This means that z, the inﬁmum value of the primal, is at least the supremum value w of the dual. 14-4

4.0.5

Strong Duality

In the general case, we don’t know that the two values will be equal. But we have the following statement (analogous to the regularity condition for SDP): if there exists an x in the interior of K, such that Ax = b, and a s in the interior of K ∗ , with A∗ y + s = c, then the primal and the dual both obtain their optimal values, and those values are equal.

4.1

Semideﬁnite Programming as a Special Case of Conic Programming

LP is a special case of conic programming, if we let K = Rn+ and take the inner product to be the standard dot product �a, b� = aT b. We can also make any SDP into a conic program; ﬁrst, we need a way of transforming semideﬁnite matrices into vectors. Since we are optimizing over symmetric matrices, we introduce a map svec(M ) that only takes the lower triangle of the matrix (including the diagonal). To be able to √ use the standard dot product with these vectors, svec multiplies all of the oﬀ-diagonal matrices by 2. So svec maps X to √ √ √ (x11 , x22 , . . . , xnn , 2x12 2x13 , . . . , 2x(n−1)n ). As a result: �svec(X), svec(Y )� =

n �

xii yii +

i=1

�

√

√ 2xij 2yij =

�

xij yij = T r(AB) = A • B.

1≤i,j≤n

1≤i 0, F (τ x) = F (x) − ν ln(τ ).

������ � ��� ��������� ������� ��������� ������ �� ��� ��� ��� ��� ν ���������������� ����������� ������

������ ��� �� �������� ��� ���� ����

F (τ x)

K = Rn+ �

= −

n �

ln(τ xj )

j=1

= −n ln(τ ) −

n �

ln(xj )

j=1

= −n ln(τ ) + F (x) ����� ������ ��� ������ ���

ν = n� �������

��� �� �������� ����

K =���p �

�� ����

F (τ X)

= − ln(det(τ X)) = − ln(τ p det(X)) = −p ln(τ ) − ln(det(X)) = −p ln(τ ) + F (X)

����� ������ ��� ������ ���

�

ν = p�

�

�������������� ����������

�� ����� ���� �� �������� �� ��� ���������� �� ����� ���� �����

x0

��� ��� ������ ��� ������

y 0 , s0

��� ��� ���� ��� � ����� ���

������ ������ �� ����� �� ��� ������ �� ��� ������� �����

x(µ0 )

µ �� µ0 � ����� s(µ0 ), y(µ0 )

��� ��� ������ ���

��� ��� ���� ��� ���� ��������� �� ��������� ���� �� ���� ���������� �� �� ����� ����

k � �������� µ ��� ������� ��� ������ xk , yk , sk s(µk ), y(µk ) ��� ��� ���� ���� ��� ������� �� ���

��� ������ ��� �� ��

µ → 0�

����� �� ��� ������

x(µk )

���

������� �����

��� ��������� �������� �� ��� ������� ���������

�� ���� �� ����� � ������ �� �������� ��� ��������� �� ��� ������� �����

�

�������� �� ������� ����

�� ����� ���� �� ���� �� ����� ������� �� ��� ������� ����� �� ���� � �������� �������� ������������ ��

µ

���� �������� ��� �����

x

�� ��

s�

���� ��������

dµ (x, s)

������ �� ����� ��� ��� ������ ���

��� ��� ����� ��� ����������� �� ������ �� ��������� ���� �� �� ��� �� ��� ������� �����

dµ (x, s) = 0

��

s + µ�F (x) = 0.

������� ���������� ���� ���� ��� ��� ���� �� ����� �����

dµ (x, s) = 0

��

x + µ�F∗ (s) = 0�

������

���� �� ����� ��� ��� ��������� ������� ��������� ����� ��� ���������� ��� ���������� ��� ����� ����� �� ��� ���� ������� ���� ������� ����� �� ��� ������ ������� ���� ��� ���������� 1 �� �������� ��� ������������ �� ��� ����� ����� ������� �� µ � ���� �� ���� ���� dµ (x, s) = 0 �� s x µ + �F (x) = 0 ⇐⇒ µ + �F∗ (s) = 0� ������� �� ����� ��� �������� �������� �� ��� ���� �� ����� �������� ��� ����� �� ��� ���� ��

x ��� � ���� ���� ������� �� s� ���� ����� � � � � �s � � � � = � x + �F∗ (s)� . dµ (x, s) = � + �F (x) �µ � �µ � x s � 2 ������ �� �a�b = �(�2 F (b))−1 a, a� ����� � F (b) ���������� ��� �������

������ ���� ���� �� ��� �� ����� � ���� ������� ��

��� ����

�a�b

��

�������

���

�������� �������� ���

�� ������� ��� ������� ������ ���

��

��� �� ���� �� ��� ���������� ��� F (x) ������ �� ���� ⎡

⎢ �F (x) = −x−1 = ⎣

�������

− x11 � � �

− x1n

⎤ ⎥ ⎦.

����� ��� ������ �� ������ ����������� ���

⎡ ⎢ �2 F (x) = ⎢ ⎣

⎤

1 x21 � � �

···

0

��

0

···

� � �

1 x2n

�

⎥ ⎥. ⎦

������� ����� ��� ������� ������ �� � �������� ������ �� ��� ��������� ��� ������� �� ������ ��� ������� �� ���� ������� �� ��� ���������

⎡ ⎢ (�2 F (x))−1 = ⎣

���������

� ⎡ � b2 � � ⎢ �1 T �a�b = � �a ⎣ �� 0

�� ������

···

0

��

� � � b2n

�

···

x21

···

0

� � �

��

0

···

� � �

x2n

�

⎤ ⎥ ⎦.

⎤ ⎥ ⎦ a�

�� ��� ��� �������� ���� ���� �� ��� ������

s µ

+ x−1

� � � � �� � �2 �� � �2 �s � � � xj sj s 1 j 2 � + x−1 � = � � xj − = −1 . �µ � µ xj µ x j j

��� ���� ����������� ��� �� ��������� ��� ��� ����� ����� ���� ������ �� � ������� �����������

���

�������� �������� ���

���������� ���

��� �� ����

���

� � � � � �2 � � �2 � � � �1 �1 1 1 1 1 1 1 −1 � � X − S −1 � = T r � 2 2 2 2 X SX − I = Tr S XS − I =� S−X � . � �µ µ µ µ x s ����� ��� ��� ���� �������� ����� �����

T r(AB) = T r(BA)

���� ��

A

���

B

�� ��� ��������

�� ��� ���� ����� ���� ���������� �� ��� ���� ������� ���� ���� ���������� ����

�

T r( µ1 SX − I)2 �

�������� ��� ��������� ����� ���������� ���� ������ ��� ���� ������ ���� �� ���� ��� �� ���� ��� ����� �� ���� ���������

����� � �� dµ (x, s) ≤ 1 ���� �x, s� ≤ 2νµ� ��� ����� �������� ���� �� �� ���� � �������� �� �� ���� ������� ��� ���� ������

�

0

1

���� ��� ������� ����� ��

µ → 0�

���

�� ���� ����� ����� ���� �� ���� ����� ��� ������� ���������

������ ��� ������� ����

k �� ���� ���� ����� µk ��� xk � ����� �� ����� �� x(µk )� �� ���� �� xk+1 � ����� ������ �� ����� �� x(µk+1 )� ���� ���������� �� ��������� k �� ���� xk , sk , yk , µk ����� ��� ����� �� ��� ������� ����� ��� �� ���� �� ������ ������ xk+1 , sk+1 , yk+1 , µk+1 ����� ��� ����� ����� �� ��� ������� �����

������� ���� �� ��������� �������

µk+1 < µk

���

����� ��� ������� ������� �� ������� ���� ����� ��� ��� �� �� ����� �� ��� ������ ������� ��� �� ��� ���������� ���� �� ������� �� ������� ��

�������

Axk+1 = b ∗

A yk+1 + sk+1 = c sk+1 + µk+1 �F (xk+1 ) = 0 ����� �� �� ��� ���� ��� ����� ��

xk+1 �

�� ��� ��� ��� ������ ��������� ��

�F (xk+1 )�

�F (xk+1 ) ∼ �F (xk ) + (xk+1 − xk )�2 F (xk ) ��� �� ���� � ������ �� ������ ��������� ��

xk+1 �

�� ����� ��� ������� �������� ��� ���������

�����������

��� ��� ����� ���� �� �� ��������� ��������� ����� �� � ��� �� ��������

k µ

Δx = xk+1 − xk

���

Δy = yk+1 − yk

���

Δs = sk+1 − sk

����

��� ��� ������� �� ��� ������� ���� ��� ������ �� ��� �������� �� � �������� �������� ��� ����� ������ ������� �� ��� �������

����� ���� �� ���������� �� ��� ��������� �������� ����� �� ���� ��� ����� �� ������

������� � �� dµ

k

(xk , sk ) ≤ 0.1

��� µk+1 =

µk 0.1 1+ √ ν

���� dµk +1 (xk+1 , sk+1 ) ≤ 0.1� ��� ������ ��������� �� ��� �� �������� �� �� ���������� ��

x(µk )

������ ���� ��������� ����� �� ���������� ���

�� ������ �� ��� ������� ��� �� ����� ���� ���� ��� ���� ����� �� ��� ����

����� ����� �� � �������� ��� �� ���������� ����������

�

������ �� ����������

�� �������

√

ν

���������� �� ��������

µ �� � �������� ������� ��� µ �� ����� �� ��� ������� √ ���� ������ ε �������� ���� x0 , y0 , s0 �������� O( ν log )

���� ���������� ��� ������� ��� �� ���� �������� ����������� �� �� ����������� �� ������� ����

��

��� ���� n2 ��������� �������� ��� ���� ������ �� ����������

�� ���� n ���������� ������� ��� ��������� ������ �� ������ ��������� ���� ���� �� �� ������ ��� ��� �������� n2 ��������� �� ������� �� n ��������� �� �� ��� ���� �� ���

��������� ���

��

��� �� �� ������

��� �� ������ ���� �� ���� � �����

x

���� �� ������ ��� ������ ��� ������ ��� ����� ��� �� ����� ��

���� ��� ���� ���� ��� ������� ����� ���� ����� �� ��� �������� �� ����� ��� ��������� ����� �� ���� �� �� ����� �� ��� ������� ���� �� �� ���������� �� ���� ����� �� ��� ������� ����� �������� ����� �� � ���� ����� ���� ����� ���� ��� ������ �� �������� ���� �� ���� ������ ��� �������� ��������� ������ ��� ������� ���� ��

µ → ∞ ��� �������� �� ��� ����� x(µ) ���� ��� ������

�� ��� ��������� ��������� ���� ����� ���� ��� ������� ����� �� ��� ��������� ��������� ��� �� ������ ���� �� � ������ ���������

�������

������ �� ������� ���� ���� � ���� ���� ������ ��

x�

������� �� ��� ��������� ������� ���� ���

�������� ��� ��������� ���� ������

����� �� � ��������� �� ������� ����� ��� �� �� ���� �� ��� ��� ��� ���� ������ ������� �� ��� ����� ���� ���� ���� �������

µ=∞

x�

����

����� �� ��� ����� ������ �� ��� ������� ������� �����

����� �� ��� ������ ��� ������� ������� ���� ��������� �� ��� ��������� �������� �� �������� ��� ����� ��� ��������� ���� ������ ������ � ����������� ���� ���������� ��� ���� ����������� ��������� �������������� ������� ��� ����� ����������� ��� ��� ������� ������ ��� ������ �� �������� �� ��� ���������� ������

���������� ��� ����

�����������

��������

�����

��

������

������

�������������

����������������������������∼�����������������������������

�����

���������

��

��� ���� ���������� ��� ���� ����� ��������������� ������� ��� �������������� ���� �������� �� ������� �������� ��� �� �������� �� ������������ ���� �� �������������� ������� �� ������ �������������� ����� �����

�������

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

November 3, 2008

Lecture 16: Approximation Algorithms Michel X. Goemans

1

Approximation Algorithms

Many optimizations problems arising in practice are N P hard. Under the widely accepted conjecture that P = 6 N P , we cannot compute eﬃciently and exactly an optimal solution for all possible instances of these problems. Several approaches have been used to deal with this intractability. On one hand, dynamic programming, branch and bound, and implicit enumeration algorithms always ﬁnd an optimal solution by navigating the space of feasible solutions in a more eﬃcient way than an exhaustive search, but their running time is not guaranteed to be polynomial in the input’s size. On the other hand, heuristic algorithms provide a sub-optimal solution to the problem, but their running time is polynomial in the size of the input problem. In this lecture we will focus on approximation algorithms, which are heuristics that always ﬁnd a solution whose objective value is guaranteed to be within a certain factor of the optimum solution. Deﬁnition 1 (Approximation Algorithm) Let P be a minimization (resp. maximization) prob lem with instances I ∈ I. An α-approximation factor for α ≥ 1 (resp. α ≤ 1) algorithm for P is an algorithm A whose running time is polynomial in the size of the given instance I, and outputs a feasible solution of cost cA such that cA ≤ α · OP TI (resp. cA ≥ α · OP TI ), where OP TI is the cost of the optimal solution for instance I. In this lecture, we will discuss three general techniques of designing approximation algorithms for NP-hard problems: 1. Using optimal value in the analysis without explicitly knowing it. 2. Linear programming relaxation and rounding. 3. Primal-dual technique.

2

A 3/2-Approximation Algorithm for the Metric TSP

The Traveling Salesman Problem is one of the most extensively studied problems in combinatorial optimization. In the metric version of the problem, an instance is a complete undirected graph G = (V, E) and c : E → R+ , where c satisﬁes the metric property: c(u, v) = c(v, u) for all u, v ∈ V , and the triangle inequality, c(u, v) ≤ c(u, w) + c(w, v), for all u, v, w ∈ V . The objective is to ﬁnd tour, that is a cycle visiting every vertex exactly once (also called a tour ) minimum cost . A 32 approximation algorithm for this problem by Christoﬁdes [1] is as follows. 1. Find a minimum spanning tree T of G. 2. Compute a minimum cost perfect matching M on the set of odd-degree vertices Vodd ⊆ T . 3. Find an Eulerian tour C ′ (a cycle visiting all the edges exactly once) in M ∪ T . 4. Output the tour C that visits the vertices of G in the order of their ﬁrst appearance in the C ′ .

Lec 16-1

2

2

1

2

1 5

1 5

7

4

5

7

4 8

8

6

8

6

6

9 3

9 11

10

7

4

3

9 11

10

3

11 10

Figure 1: Execution of Christoﬁdes’ algorithm on an instance. The ﬁrst ﬁgure shows a minimum cost spanning tree. The second ﬁgure shows the addition of a minimum cost matching on odd degree vertices in the tree, and the third ﬁgure shows a cycle obtained after “shortcutting” an Eulerian tour in the previous graph, starting from vertex 1. Theorem 1 The above algorithm is a 3/2-approximation algorithm for the metric TSP. Proof: It is clear that all steps in the algorithm can be implemented in polynomial time. The minimum spanning tree can be found using a greedy algorithm, and the minimum cost matching for Vodd can be found in polynomial time using the ellipsoid algorithm, as discussed in one of the previous lectures (or by a purely combinatorial algorithm also based on the linear program we discussed). Note that c(T ) ≤ OP T , because the optimal tour without an edge becomes a tree. Also, c(M ) ≤ OP T /2. To see this, consider any optimal tour, and then short-cut it to get a cycle visiting only vertices in Vodd with cost at most OP T . Since the cycle induces two matchings consisting of alternating edges, at least one of them will have cost at most OP T /2. From this, the total cost of the Eulerian cycle, an upper bound of the cost of the algorithm, is at most OP T + OP T /2 = 3/2 · OP T . � Note that in the analysis of the algorithm, we used the value of OP T even without explicitly computing it exactly, or getting a lower bound on it. Figure 1 shows an instance of the metric TSP, and the execution of the algorithm on this instance. A few remarks: • The above analysis for the algorithm is tight, i.e. ∀ε > 0 there is an instance I such that the algorithm returns a solution which is 3/2 − ε times the optimal solution. • Currently, no algorithm with an approximation factor better than 3/2 is known for metric TSP. • TSP is known to be MAX-SNP hard [5] even for the case when distances are either 1 or 2. Also, Papadimitriou and Vempala [4] have proved that a 1.01 approximation algorithm for the metric TSP will imply P = N P .

3

Designing Approximation Algorithms via Relaxations

One of the most important paradigms in the design of approximation algorithms are relaxations. Consider the following (hard) minimization problem. min f (x) s.t. x ∈ S.

Lec 16-2

One approach to solve this problem is to extend S to a bigger set P ⊇ S where the same problem is easier to solve. Namely, we extend the function f to a function g : P → R satisfying g(x) = f (x), ∀x ∈ S (or g(x) ≤ f (x)). If this condition holds, then min f (x) ≥ min g(x), x∈S

x∈P

which gives a lower bound for the value of the optimal solution. Therefore, if an algorithm gives a solution x∗ ∈ S which satisﬁes f (x∗ ) ≤ α minx∈P g(x), then this is an α- approximation algorithm for the problem. For example, many combinatorial optimization problems can be expressed as min cT x s.t. Ax = x ∈

b, {0, 1}n.

A natural relaxation is to replace the integrality constraint xi ∈ {0, 1} by the linear constraint 0 ≤ xi ≤ 1, we obtain the LP relaxation of the integer program above. min cT x s.t. Ax

=

0 ≤ xi

b, ≤ 1,

∀i = 1, . . . , n.

In some cases, the polytope corresponding to the LP relaxation has all integral extreme points. In such cases, it is suﬃcient to solve the LP relaxation to solve the original problem exactly. But this is not true in general.

3.1 LP Relaxation for the Vertex Cover Problem Given an undirected graph G = (V, E), a vertex cover of G is a collection of vertices C ⊂ V such that all edges e = (u, v) in E satisfy C ∩ {u, v} = 6 ∅. The Vertex Cover�problem on an instance G = (V, E), c : E → R+ is to ﬁnd a cover C of G of minimum cost c(C) = v∈C c(v). This is known to be an NP-hard problem. A natural formulation using integer variables and linear constraints is the following. We deﬁne a variable xu ∈ {0, 1} which takes value 1 if it is in the vertex cover, 0 otherwise. Then the following is an integer programming formulation for the vertex cover problem. min

�

cv xv

(1a)

v∈V

s.t. xu + xv

≥

1,

∀(u, v) ∈ E,

xu

∈

{0, 1},

∀u ∈ V.

(1b) (1c)

The LP relaxation for the vertex cover problem is min

�

cv xv

(2a)

v∈V

s.t. xu + xv xu

≥ ≥

1, 0,

∀(u, v) ∈ E, ∀u ∈ V.

(2b) (2c)

Note that we removed the xu ≤ 1 constraints, since if xu > 1 we can change it to xu = 1 without increasing the cost, and still have a feasible solution. Lec 16-3

x1 = 1/2

x2 = 1/2

x3 = 1/2

Figure 2: An example where the LP relaxation for the Vertex Cover does not have an integer optimal solution. The LP relaxation does not necessarily have an optimal integral solution in general. For example, consider the graph given in Figure 3.1 with all costs equal to 1. The optimal solution for this instance has cost OP T = 2 , but the optimal solution for the LP relaxation has cost LP = 3/2, as shown in the ﬁgure. What this example shows is not only that LP < OP T in general, but also an interesting fact about the strength of this relaxation. Suppose that we are going to use LP as a lower bound on OP T in order to prove an approximation guarantee. As we will see in the next subsection, we will be able to ﬁnd a cover C with cost at most 2LP . Therefore, we can say c(C) ≤ 2LP ≤ 2OP T to prove an approximation guarantee of 2, However, the example proves that we will not be able to decrease this factor beyond 4/3. This follows from the fact that OP T ≤ c(C) ≤ αLP ≤ αOP T ⇒ OP T /LP ≤ α then the best we can hope for is at most 4/3 by using this relaxation. This important property of the “bad examples” is captured in the concept of integrality gap. Deﬁnition 2 (Integrality gap) Given a relaxation LP (Π) for an integer program IP (Π) that formulates a combinatorial (minimization) optimization problem on a collection of instances {Π}, the integrality gap of the linear program relaxation is the largest ratio between the optimal solution of both formulations, namely: Integrality gap = sup Π

IP (Π) LP (Π)

For the Vertex Cover LP relaxation, the integrality gap is exactly 2. To see that it is at least 2, consider the complete graph G = Kn , with unitary costs. The minimum vertex cover has cost n − 1, while the linear program relaxation can assign 1/2 to all variables, which gives a total cost of n/2. Therefore, the integrality gap is at least 2(nn−1) → 2. The upper bound follows from the 2-approximation algorithm we will see in the next subsection.

3.2

A 2-approximation Algorithm for Vertex Cover

A natural approach to get an integral solution from a fractional solution is to round the fractional values. A simple rounding scheme for the vertex cover is as follows. 1. Solve the linear programming relaxation given by (2a)-(2c), to get the fractional solution x∗ .

Lec 16-4

2. Compute the vertex cover as C = {v ∈ V, x∗v ≥ 1/2} (i.e., round each fractional variable to the nearest integer). Theorem 2 The above rounding scheme is a 2-approximation algorithm for the Vertex Cover prob lem. Proof: First, we need to check that C is indeed a vertex cover. For each e = (u, v) ∈ E, x∗u +x∗v ≥ 1, so at least one of x∗u , x∗v has value at least 1/2, and is in C. Next, the cost of this vertex cover satisﬁes � � cv ≤ 2 cv x∗v = 2LP ≤ 2OP T, c(C) = v∈V

v:x∗ v ≥1/2

hence the LP rounding is a 2-approximation algorithm for the vertex cover problem. � This is a very basic (the simplest) example of rounding; more sophisticated rounding procedures have been used to design approximation algorithms; we’ll see some in coming lectures.

4

The Primal Dual Technique

Yet another way of designing approximation algorithms for intractable problems is the primal dual method. The basic idea of the primal dual scheme is this: At every point of the algorithm, we keep a feasible dual solution, and a corresponding infeasible integer primal solution. The dual variables are then modiﬁed at every step and so is the infeasible primal solution, so as to achieve primal feasibility. At this point, the dual gives a lower bound (for minimization problems) on the optimal primal objective function value, which is used to derive the approximation factor for the algorithm. The interesting thing about this technique is that we do not need to explicitly solve the linear program (as is the case in rounding); the linear program is used only in the analysis of the algorithm. We illustrate this method for the vertex cover problem. The linear program for the vertex cover problem is given by (2a)-(2c). The dual of this linear program is given by max

�

ye

e∈E

s.t.

�

ye

≤

cv

ye

≥

0

∀v ∈ V,

(3)

e∈δ(v)

∀e ∈ E.

The primal dual algorithm for the vertex cover problem is as follows. In the algorithm, C corresponds to the set of vertices in the (supposed to be) vertex cover, and F is the set of edges in the graph not yet covered by C. 1. y(v) ← 0

∀v ∈ V,

C ← ∅,

F ← E.

2. While F = 6 ∅ 3.

Let e = (u, v) be any edge in F .

4.

Increase ye until the constraint (3) becomes tight for u or v.

5.

Add that corresponding vertex (say it is v) to C.

6.

F ← F \ δ(v).

Theorem 3 The above algorithm achieves an approximation ratio of 2 for the vertex cover problem. Lec 16-5

3

4

3

4

3 3

2 4

3 3

2 4

y=3

y=3

y=1

3 3

y=3

2 4

3

2

y=1

3

y=1

2

Figure 3: Illustration of the primal-dual algorithm for the vertex cover problem. The cost of the vertices are indicated next to each vertex. Dotted edge denotes the edge currently under considera tion, thick edges denote those already covered by the current vertex cover. The vertices in the cover are shown as solid circles. Proof: First of all, it is clear that the set C returned by the algorithm is a vertex cover. Let y be the dual solution returned. Observe that by construction, this solution is dual feasible (we maintain � dual feasibility throughout the execution). Furthermore, for any v ∈ C, we have that cv = e∈δ(v) ye . Let us now compute the cost of the vertex cover returned by the algorithm. � � � � � cv = ( ye ) = αe ye ≤ 2 ye v∈C

v∈C e∈δ(v)

e∈E

e∈E

≤ 2LP ≤ 2OP T,

(4a) (4b)

where αe = 2, for edge e = (u, v) if both u, v ∈ C, 1 otherwise. The inequality (4a) follows from weak duality, and inequality (4b) follows from the fact that the primal LP is a relaxation of the vertex cover problem. � Figure 3 illustrates the execution of the primal-dual algorithm on a graph. For this instance, the algorithm returns a vertex cover of cost 9, whereas the optimal solution in this instance has Lec 16-6

cost 7 (corresponding to the two vertices on the diagonal edge). The lower bound given by the dual solution has value 3 + 1 + 1 = 5. A few ﬁnal remarks: • Dinur and Safra [2] have proved that it is NP-hard to approximate to the vertex cover with a factor better than 1.36. • Currently, there is no algorithm for the vertex cover problem which achieves an approximation ratio better than 2. So the two (simple!) algorithms presented here are, in fact, the present best known approximation algorithms for this problem. • Khot and Regev [3] have proved that it is UGC-hard to approximate vertex cover within a factor 2 − ε, for any ε > 0.

References [1] Christoﬁdes, N. (1976). Worst-case analysis of a new heuristic for the travelling salesman problem, Report 388, Graduate School of Industrial Administration, CMU. [2] Dinur, I. and S. Safra (2002). The importance of being biased. In Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 33-42. [3] Khot, S. and O. Regev (2008). Vertex cover might be hard to approximate to within 2 − ε. Journal of Computer and System Sciences, 74:335-349. [4] Papadimitriou, C.H. and S. Vempala (2000). On the approximability of the travelling salesman problem. In Proceedings of the 32nd ACM Symposium on Theory of Computing, pp. 126-133. [5] Papadimitriou, C.H. and M. Yannakakis (1993). The travelling salesman problem with distances one and two. Mathematics of Operations Research, 18:1-11.

Lec 16-7

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

November 17, 2008

Lecture 17 Lecturer: Michel X. Goemans

1

Introduction

We continue talking about approximation algorithms. Last time, we discussed the design and analysis of approximation algorithms, and saw that there were two approaches to the analysis of such algorithms: we can try comparing the solution obtained by our algorithm to the (unknown) optimal solution directly (as we did for Christoﬁdes’s algorithm for TSP), or, when that is not possible, we can compare our solution to a relaxation of the original problem. We can also use a relaxation to design algorithms, even without solving the relaxed problem: we saw a simple primal-dual algorithm that used the LP relaxation of the Vertex Cover problem. In this lecture, we shall examine further the primal-dual approach and also the design of approx imation algorithms through local search, and illustrate these on the facility location problem.

2 2.1

The facility location problem Problem statement

We are given a set F of facilities, and a set D of clients. Our goal is to open some facilities and assign clients to them so that each client is served by exactly one facility. We are given, for each i ∈ F , the cost fi of opening facility i and the cost cij of assigning client j to facility i for each j ∈ D. � If we open a certain subset F ′ ⊆ F of facilities, the cost incurred is i∈F ′ fi . Subsequently, we will assign each client to the nearest facility, incurring a cost mini∈F ′ cij for client j. Thus our problem can be stated as the following optimization problem: �� � � min fi + (min′ cij ) . ′ F ⊆F

i∈F ′

j∈D

i∈F

This problem arises naturally in many settings, where the facilities might be schools, warehouses, servers, and so on. It is possible to imagine additional constraints such as capacities on the facilities; we shall deal with the simplest case and assume no other constraints. We shall also assume that the costs are all nonnegative, and that the cij s are in fact metric costs — that they come from a metric on F ∪ D where the distance between i ∈ F and j ∈ D is cij .

2.2

Current status

This problem is known to be NP-hard. Hence we seek to design approximation algorithms. The best algorithm known is a 1.5-approximation algorithm, due to Byrka [1]. This is close to the best possible, in the sense that the following “inapproximability” result is true: if there is a 1.463-approximation algorithm, then NP ⊆ DTIME(nlog log n ) (see [2]). Since our focus in this lecture is on the techniques, we will see simpler approximation algorithms that illustrate the approaches, each of which gives only a 3-approximation.

17-1

3

The primal-dual approach

We shall follow the general outline behind primal-dual approaches to many problems: 1. Formulate the problem as an integer program, 2. Relax it to a linear program, 3. Look at the dual of the linear program, 4. Devise an algorithm that ﬁnds an integral primal-feasible solution and a dual-feasible solution, 5. Show that the solutions are within a small factor of each other, and hence of the optimum.

3.1

IP formulation

Let the variable yi denote whether the facility i is opened, i.e., � 1 if facility i is opened, yi = for each i ∈ F . 0 otherwise Similarly, let xij denote whether the client j is assigned to facility i, i.e., � 1 if client j is assigned to i, xij = for each i ∈ F and j ∈ D. 0 otherwise So we must have yi ∈ {0, 1} for all i ∈ F .

(1)

xij ∈ {0, 1} for all i ∈ F , j ∈ D.

(2)

and Further, we have the condition that each client must be assigned to exactly one facility: � xij = 1

(3)

i∈F

and the condition that clients can be assigned only to facilities that are actually open, i.e. that xij = 1 =⇒ yi = 1. One way of writing this as a linear relation is: yi − xij ≥ 0 Finally, the objective function (cost) is � i∈F

fi yi +

��

(4)

cij xij .

(5)

i∈F j∈D

The problem of minimizing (5) subject to conditions (1) (2) (3) and (4), is an integer programming problem.

17-2

3.2

LP relaxation

The conditions (1) and (2) are not linear constraints, but we can try to relax them to constraints that are linear. We write, for (2), the condition 0 ≤ xij

(6)

(we do not have to write xij ≤ 1, as that is already forced by (3)), and for (1), we write the condition 0 ≤ yi

(7)

(as the cost is an increasing function of yi , the minimization will make sure that yi ≤ 1, if at all possible). Thus we have the following linear program: �� � �� min (8) fi yi + cij xij i∈F

s.t.

�

i∈F j∈D

xij = 1

∀j ∈ D

(9)

yi − xij ≥ 0

∀i ∈ F, ∀j ∈ D

(10)

xij ≥ 0 yi ≥ 0

∀i ∈ F, ∀j ∈ D ∀i ∈ F

(11) (12)

i∈F

We cannot expect every vertex of this LP to be 0-1; there can exist instances for which the LP optimum does not correspond to any convex combination of valid facility location integral solutions. Thus the LP does not give a solution directly. One way of using the LP would be to solve it and then round the solution to a valid facility location; this needs some care but can be used to derive an approximation algorithm for the facility location problem. Another possibility is to pursue the primal-dual approach which is what we shall now do.

3.3

LP dual

Let us look at the dual of the LP. Introducing dual variables vi for the constraints (9) and wij for the constraints (10), we get the dual LP: � max vi (13) j∈D

s.t.

�

wij ≤ fi

∀i ∈ F

(14)

∀i ∈ F, ∀j ∈ D ∀i ∈ F, ∀j ∈ D

(15) (16)

j∈D

− wij + vj ≤ cij wij ≥ 0

At the optimal solutions to the primal and dual, the complementary slackness condition says that: � yi > 0 =⇒ wij = fi (17) j∈D

xij > 0 =⇒ vj − wij = cij yi − xij > 0 =⇒ wij = 0.

(18) (19)

If we could ﬁnd a primal feasible solution and a dual feasible solution that satisﬁed the comple mentary slackness conditions, and furthermore the primal solution was integral, then we would have 17-3

solved the problem. But as we have seen, this is not possible in general, because there might not be an integer solution corresponding to the LP optimum. We interpret the complementary slackness conditions as follows. Client j pays a charge vj ≥ cij , if assigned to i (the condition (18)). The surpluses wij pay for the cost of opening the facility (the condition (17)). We use this interpretation to guide our primal-dual algorithm.

3.4

Primal-dual algorithm for the facility location problem

We will maintain vj ’s and wij ’s that always constitute a dual-feasible solution. Initially, set each vj = 0 and each wij = 0. Start increasing all the vj ’s at rate 1. We watch out for 3 possible events: 1. For some i, j, vj reaches cij , so that (18) holds, and (15) is in danger of being violated: In this case, we start increasing wij at rate 1 as well, so that vj − wij = cij will continue to hold. � 2. For some i, j∈D wij reaches fi — “facility i is paid for”: In this case, we freeze (stop increasing) all the wij ’s. We also freeze all the vj ’s for which wij was being increased, namely {j : vj > cij }. Finally, we also freeze those wi′ j for which a vj has been frozen now, because we no longer need to increase them. 3. For some i, j, vj reaches cij , when i is already paid for: In this case, we cannot increase wij now, so we instead freeze vj , and also freeze all the wi′ j . We repeat this process until every vj is frozen. The procedure we have described is often referred to as a ’dual ascent’ procedure, we we have only increased dual variables. � Suppose we stop with the values (¯ v, w ¯). We always remain dual-feasible, so j∈D v¯j when we stop is a lower bound on the optimal value of the LP. We now have to decide how to convert the obtained values into a facility location, i.e. which facilities to open. We will only open a subset of the paid-for facilities. Say facility i is paid for at time ti . When we terminate, create the graph G = (F ∪ D, E) where E = {(i, j), w ¯ij > 0}. Deﬁne cluster(i) as the set of all facilities that are neighbors of neighbors of i in this graph. Process the paid-for facilities in nondecreasing order of ti . First, consider the ﬁrst paid-for facility, i.e. i for which ti is minimum, and open it. We will not open any other facility in cluster(i). In general, open facility i′ if it is not already in the cluster of a previously opened facility, i.e. iﬀ ′ i ∈ / i cluster(i) where the union is over previously opened facilities i. Having selected which facilities to open, we assign clients to facilities the natural way: assign each client to the nearest facility. We now prove that this algorithm gives a 3-approximation algorithm.

3.5

Analysis of the algorithm

Claim 1 Let O and A be the opening-cost and assigning-cost of the (primal) solution constructed by the algorithm. Then, � 3O + A ≤ 3 v¯j . j∈D

Proof: Let U be the set of facilities opened by the algorithm, and σ(j) ∈ U be the facility that the client j is assigned to. We need to show that � � � 3 fi + cσ(j)j ≤ 3 v¯j . i∈U

j∈D

For each client j, there are two possible scenarios:

17-4

j∈D

Figure 1: Case (II). If i makes vj stop increasing via the third event from Section 3.4, there is no edge between i and j in G. Otherwise, (i, j) ∈ G. (I) j has exactly one open facility, say i = σ(j), in its neighborhood in G. (II) j has no open facility in its neighborhood in G. First consider case (I). Since w ¯ij > 0 from the way we construct G, the algorithm freezes variables v¯j , w ¯ij after tightening the equation cij = v¯j − w ¯ij . Thus, we have cij + w ¯ij = v¯j , and so cij + 3w ¯ij ≤ 3(cij + w ¯ij ) = 3¯ vj .

(20)

If we take the summation of (20) over those clients in case (I), we obtain from �

cσ(j)j + 3

j∈D:case

�

fi ≤ 3

i∈U

(I)

�

j∈D:case

v¯j .

�

j

3w ¯ij = 3fi that

(I)

Thus, the opening of all facilities is already accounted for. Now consider case (II) where j contributes nothing for constructing facilities. Hence for com pleting the proof, it is enough to show that the assigning-cost for j is at most 3¯ vj i.e. there exists a facility i′ ∈ U such that ci′ j ≤ 3¯ vj . Let i be the facility that makes vj stop to increase, for which it follows that cij ≤ v¯j

and

ti ≤ v¯j .

(21)

In the case when i ∈ U , it follows obviously that cij ≤ v¯j ≤ 3¯ vj . Hence assume i ∈ / U . Since i is not open (although i is fully paid for), there exists a facility i′ ∈ U such that i ∈ cluster(i′ ). Thus there exists a client j ′ which is connected to both i and i′ in G. Since w ¯ij ′ > 0 and w ¯i′ j ′ > 0, cij ′ ≤ ti

and

ci′ j ′ ≤ ti′ .

(22)

From the triangle inequality, (21), (22) and ti′ ≤ ti ≤ v¯j (since i was responsible for j freezing), we have ci′ j

≤

ci′ j ′ + cij ′ + cij

≤ ≤

ti′ + ti + v¯j 2ti + v¯j

≤

3¯ vj ,

which completes the proof.

4

�

The local search based approach

Now we study a diﬀerent type of approximation algorithm based on local search. 17-5

4.1

General paradigm

Suppose we want to minimize the objective function c(x) over the space S of feasible solutions. In the case of the facility problem, S is a subset of facilities and c(x) is the sum of the opening costs and the assigning costs. In a local search based algorithm, we have a neighborhood N : S → 2S which satisﬁes the following two conditions: • v ∈ N (v) for all v ∈ S, • there exists an eﬃcient algorithm to decide whether c(v) = minu∈N (v) c(u) for a given v and, if not, ﬁnd u ∈ N (v) such that c(u) < c(v). Using this algorithm for searching the neighborhood, the algorithm travels in the space S iteratively ﬁnding a better solution in N (v) than the current solution v ∈ S. It terminates when the current solution v cannot be improved i.e. v is a locally optimal solution. In a local search based algorithm, one also needs an algorithm for ﬁnding an initial feasible solution. We can raise some issues related to the design and analysis of local search algorithms: Q0 : What neighborhood N should we choose?

- If |N (v)| is large, one can ﬁnd a better local solution in each iteration but designing an

algorithm to eﬃciently search the neighborhood might be more diﬃcult.

Q1 : How good is a locally optimal solution which the algorithm provides?

- This decides the approximation ratio of the algorithm.

Q2 : How many iterations does the algorithm require before ﬁnding a local optimum?

- Using the local search algorithm is one way to ﬁnd a local optimum; there might be some

more direct way, and the complexity of ﬁnding a local optimum has been studied (see the

discussion about the class PLS in next lecture).

Consider the Traveling Salesman problem. One possible neighborhood N arises from 2-exchange where u ∈ N (v) if the tour u can be obtained by removing two edges � � in v and replacing these with two diﬀerent edges that reconnect the tour. Therefore, |N (v)| = n2 , hence it is enough to check only O(n2 ) solutions to ﬁnd a better solution in N (v). Other neighborhoods can also be deﬁned, such as for example k-exchange in which k edges are replaced. In the problem set, a neighborhood of exponential size is considered.

4.2

Local search algorithm for the facility location problem

Now we explain a local search based approximation algorithm for the facility location problem. The set U of open facilities is enough for describing any solution in our solution space S since, after the open facilities are decided, the optimal assignment follows easily (and eﬃciently). The simplest neighborhood one can consider is to simply allow the addition of a new facility, the deletion of an open facility, or replacing one open facility by another. More formally, N (U ) is designed as follows: U ′ ∈ N (U ) if U ′ = U ∪ {i}, U ′ = U \ {i′ }, or U ′ = U ∪ {i} \ {i′ } for some facilities i and i′ . Note that |N (U )| = O(n2 ) which settles the time-complexity issue for ﬁnding a better solution in N (U ). The following claim settles Q1 . We will examine Q2 in the next lecture, albeit not for the facility location problem per se. Claim 2 Consider a locally optimal solution v for the above neighborhood N . Then, its opening cost O and assigning cost A satisfy A ≤ A∗ + O∗ O ≤ O∗ + 2A∗ ,

(23) (24)

where O∗ and A∗ are the opening cost and the assigning cost of the optimal solution respectively. 17-6

Remark 1 Claim 2 guarantees an approximation ratio of 3 for this local-search algorithm since A + O ≤ 3A∗ + 2O∗ ≤ 3(A∗ + O∗ ) = 3OP T ∗ . Proof: In this lecture, we will see only the proof of (23) due to time constraints. (The proof of (24) would take longer than the 5 minutes available at this point.) Let U and U ∗ be the sets of open facilities in locally and globally optimal solutions respectively. For a facility i ∈ U ∗ \ U , the local optimality of U implies � � � fi + cσ∗ (j)j − cσ(j)j ≥ 0, j:σ∗ (j)=i

where σ(j) and σ ∗ (j) are the open facilities which j is assigned to in U and in U ∗ respectively (since we could just reassign just the clients for which σ ( j) is i). By taking the summation over all i ∈ U ∗ \ U , it follows that O∗ + A∗ − A ≥ 0.

� Now consider the time-complexity issue Q2 . There exist instances for which this algorithm will take an exponential number of steps. In fact, the negative result for this issue comes from the fact that the facility location problem (with this deﬁnition of the neighborhood) is PLS-complete [3], see next lecture for more details. Furthermore, it is unlikely that any algorithm (not necessarily based on this iterative local search process) can ﬁnd a locally optimal solution in polynomial time in the worst case. However, if the algorithm walks to a better solution only when it improves the current solution signiﬁcantly by ε factor, it can be guaranteed that the algorithm terminates in polytime with respect to n and ε. Furthermore, one can obtain the ε-version of Claim 4.2, which leads to (3 + ε′ )-approximation ratio of the algorithm.

References [1] Jaroslaw Byrka. An optimal bifactor approximation algorithm for the metric uncapacitated facility location problem. Proceedings of APPROX 2007, 2007. [2] Sudipto Guha and Samir Khuller. Greedy strikes back: Improved facility location algorithms. In Journal of Algorithms, pages 649–657, 1998. [3] Y. Kochetov and D. Ivanenko. Computationally Diﬃcult Instances for the Uncapacitated Facility Location Problem, volume 32. Springer US, 2005.

17-7

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

November 19, 2008

Approximaion Algorithms: MAXCUT Lecturer: Michel X. Goemans

1

MAX-CUT problem

+ MAX-CUT Problem: Given a graph G = (V, � E) and weights on the edges w : E → R , ﬁnd a ¯ ¯ cut (S : S ), S ⊆ V that maximizes w(S : S ) = e∈(S:S) ¯ w(e). MIN-CUT Problem: ﬁnd a cut (S : S¯) that minimizes w(S : S¯).

There is a polynomial algorithm for the MIN-CUT problem: use the min s − t cut algorithm on each pair of vertices (or, better, for a ﬁxed s), and take the smallest of them. However, the MAX-CUT problem is NP-hard, and we’ll try several ways of designing approximation algorithms for it.

2

Idea #1: Local Search

Algorithm: Start from any cut (S : S¯). Deﬁne the neighborhood N (S : S¯) of the cut to be the MOVE neighborhood: all the cuts that result from moving one vertex from one side of the cut to the other side. Consider a locally maximum cut for this neighborhood. Lemma 1 If (S : S¯) is a local maximum for the MOVE neighborhood, then w(S : S¯) ≥ 12 w(E) ≥ 1 2 OP T . Proof of lemma 1: Look at a vertex i ∈ V . Let Ci be the set of all edges (i, j) ∈ E that are part of the cut (S : S¯) (that is if i ∈ S then j ∈ S¯ and vice versa). Let Ai be the set of all edges (i, j) ∈ E that are not part of the cut (S : S¯). Since moving any single vertex i to the other side of the cut does not improve the weight of the cut, we know that: w(Ci ) ≥ w(Ai ). Summing over all vertices i, we get: �

w(Ci ) ≥

i∈V

�

w(Ai ),

i∈V

or 2w(S : S¯) ≥ 2w(E\(S : S¯)). Rearranging, we get: 4w(S : S¯) ≥ 2w(E) or ¯ ≥ w(S : S)

1 1 w(E) ≥ OP T. 2 2 �

Remarks: (a) The bound of 1/2 cannot be improved for this MOVE neighborhood: Consider a k-vertex cycle, where k is a multiple of 4, as the graph G (with unit weights). The best cut will include

Lec18-1

all edges. However, if we start from a cut in which the edges of the cycle alternate in and out of the cut, we have a locally optimum solution with only k/2 edges in the cut. (b) The local search algorithm based on the MOVE neighborhood for MAX-CUT takes expo nentially many steps in the worst-case. This is true even for graphs that are 4-regular (each vertex has exactly 4 neighbors) (Haken and Luby [1]). For 3-regular graphs the algorithm is polynomial (Poljak [4]). (c) To capture the complexity of local search, Johnson, Papadimitriou and Yannakakis [3] have deﬁned the class PLS (Polynomial Local Search). Members of this class are optimization problems of the form max{f (x) : x ∈ S} together with a neighborhood N : S → 2S . We say that v ∈ S is a local optimum if c(v) = max{c(x) : x ∈ N (v)}. To be in PLS, we need to have polynomial-time algorithms for (i) ﬁnding a feasible solution, (ii) deciding if a solution is feasible and if so computing its cost, and (iii) deciding if a better solution in the neighborhood N (v) of a solution v exists and if so ﬁnding one. They introduce a notion of reduction, and this leads to PLS-complete problems for which any problem in PLS can be reduced to it. Their notion of reduction implies that if, for one PLS-complete problem, one has a polynomial-time algorithm for ﬁnding a local optimum then the same true for all PLS problems. In particular, MAX-CUT with the MOVE neighborhood is PLS-complete [5]. Furthermore, it follows from Johnson et al. [3] that the obvious local search algorithm is not an eﬃcient way of ﬁnding a local optimum for a PLS-complete problem; indeed, for any PLS-complete problem, there exist instances for which the local search algorithm of repeatedly ﬁnding an improved solution takes exponential time. The result of Haken and Luby above is thus just a special case. Still, this does not preclude other ways of ﬁnding a local optimum.

3

Idea #2: Random Cut

Algorithm: There are 2|V | possible cuts. Sample a cut randomly using a uniform distribution over all possible cuts in the graph: ∀v ∈ V, P r(v ∈ S) = 12 , independently for all vertices v ∈ V . Lemma 2 This randomized algorithm gives a cut with expected weight that is ≥ 12 OP T . Proof of lemma 2: ¯ E[w(S : S)] = E[

�

¯ = w(e)I(e ∈ (S : S))]

e∈E

=

� e∈E

�

¯ w(e) · P r(e ∈ (S : S))

e∈E

1 1 w(e) · = w(E). 2 2

� Using the method of conditional expectations, we can transform this randomized algorithm into a deterministic algorithm. The basic idea is to use the following identity for a random variable f and event A: E[f ] = E[f |A]P r(A) + E[f |A¯]P r(A¯) = E[f |A]P r(A) + E[f |A¯](1 − P r(A)) ≤ max{E[f |A], E[f |A¯]}. In our setting, we consider the vertices in a speciﬁc order, say v1 , v2 , · · · , and suppose we have already decided/conditioned on the position (i.e. whether or not they are in S) of v1 , · · · , vi−1 . Now, condition on whether vi ∈ S. Letting f = w(S : S¯), we get: E[f |{v1 , · · · , vi−1 } ∩ S = Ci−1 ] ≤ max(E[f |{v1 , · · · , vi−1 } ∩ S = Ci−1 , vi ∈ S], E[f |{v1 , · · · , vi−1 } ∩ S = Ci−1 , vi ∈ / S]). Lec18-2

Both terms in the max can be easily computed and we can decide to put vi on the side of the cut which gives the maximum, i.e. we set Ci to be either Ci−1 or Ci−1 ∪ {vi } in such a way that: E[f |{v1 , · · · , vi−1 } ∩ S = Ci−1 ≤ E[f |{v1 , · · · , vi } ∩ S = Ci ]. When we have processed all inequalities, we get a cut (Cn : C¯n ) such that 1 w(E) ≤ E[f ] ≤ w(Cn : C¯n ), 2 and this provides a deterministic 0.5-approximation algorithm. Examining this derandomized version more closely, we notice that we will place vi on the side of the cut that maximizes the total weight between vi and the previous vertices {v1 , v2 , · · · , vi−1 }. This is therefore a simple greedy algorithm. Remarks: (a) The performance guarantee of the randomized algorithm is no better than 0.5; just consider the complete graph on n vertices with unit weights. Also, the performance guarantee of the greedy algorithm is no better than 0.5 int he worst-case.

4

Idea #3: LP relaxation

Algorithm: Start from an integer-LP formulation of the problem: � max w(e)xe e∈E

xe ∈ {0, 1} ∀e ∈ E � � xe + (1 − xe ) ≤ |C| − 1 ∀cycle C ⊆ E ∀F ⊆ C, |F | odd

s.t.

e∈F

⇔

�

e∈C\F

xe −

e∈F

�

xe ≤ |F | − 1 ∀cycle C ⊆ E ∀F ⊆ C, |F | odd

e∈C\F

¯ we need the second type of Since we have a variable xe for each edge (if xe = 1 than e ∈ (S : S)), constraints to guarantee that S is a legal cut. The validity of these constraints comes from the fact that any cycle and any cut must intersect in an even number of edges. even number of edges that are in the cut. Next, we relax this integer program into a LP: � max w(e)xe e∈E

s.t.

0 ≤ xe ≤ 1 ∀e ∈ E � � xe − xe ≤ |F | − 1 ∀cycle C ⊆ E ∀F ⊆ C, |F | odd. e∈F

e∈C\F

This isa relaxation of the maximum cut problem, and thus provides an upper bound on the value of the optimum cut. We could try to solve this linear program and devise a scheme to “round” the possibly fractional solution to a cut. Remarks:

Lec18-3

(a) This LP can be solved in a polynomial time. One possibility is to use the ellipsoid algorithm as the separation problem over these inequalities can be solved in polynomial time (this is not trivial). Another possibility is to view the feasible region of the above linear program as the 2 projection of a polyhedral set Q ⊆ Rn with O(n3 ) number of constraints; again, this is not obvious. (b) If the graph G is planar, then all extreme points of this linear program are integral and correspond to cuts. We can therefore ﬁnd the maximum cut in a planar graph in polynomial time (there is also a simpler algorithm working on the planar dual of the graph). T 1 1 (c) There exist instances for which OP LP ∼ 2 (or ∃G = (V, E), w(e) = 1, OP T ≤ n( 2 + �), LP ≥ n(1 − �)), which means that any rounding algorithm we could come up with will not guarantee a factor better than 12 .

5

Idea #4: SDP relaxation

The idea is to use semideﬁnite programming to get a more useful relaxation of the maximum cut problem. This is due to Goemans and Williamson [2]. Instead of deﬁning variables on the edges as we did in the previous section, let’s use variables on the vertices to denote which side of the cut a given vertex is. This leads to the following quadratic integer formulation of the maximum cut problem: �

max

w(i, j)

(i,j)∈E

1 − yi yj 2

yi ∈ {1, −1}n ∀i ∈ V.

s.t.

Here we have deﬁned a variable yi for each vertex i ∈ V such that yi = 1 if i ∈ S and yi = −1 otherwise. We know that an edge (i, j) is in the cut (S : S¯) iﬀ yi yj = −1, and this explains the quadratic term in the objective function. We can rewrite the objective function in a slightly more convenient way using the Laplacian of the graph. The Laplacian matrix L is deﬁned as follows: ⎧ ⎪ (i, j) ∈ /E ⎨0 lij = −w(i, j) i �= j, (i, j) ∈ E ⎪� ⎩ w(i, k) i = j. k:k=i � that is, the oﬀ-diagonal elements are the minus the weights, and the diagonal elements correspond to the sum of the weights incident to the corresponding vertex. Using the Laplacian matrix, we can rewrite equivalently the objective function in the following way: T

y Ly

=

n � n �

yi yj lij =

i=1 j=1

n � i=1

yi2

�

�

w(i, k) −

k�=i

yi yj w(i, j)

(i,j)∈E

⎛ = 2w(E) −

�

yi yj w(i, j) = 4 ⎝

(i,j)∈E

(i,j)∈E

and thus � (i,j)∈E

⎞

�

w(i, j)

1 − yi y j 1 = y T Ly. 4 2

Lec18-4

w(i, j)

1 − yi y j ⎠ , 2

Thus the maximum cut value is thus equal to 1 max{ y T LY : y ∈ {0, 1}n }. 4 If the optimization was over all y ∈ Rn with ||y||22 = n then we would get that 1 n max{ y T LY : y ∈ Rn , ||y||2 = n} = λmax (L), 4 4 where λmax (L) is the maximum eigenvalue of the matrix L. This shows that OP T ≤ n4 λmax (L); this is an eigenvalue bound introduced by Delorme and Poljak. Using semideﬁnite programming, we will get a slightly better bound. Using the Frobenius inner product, we can again reformulate the objective function as: 1 T 1 y Ly = L • (yy T ), 4 4 or as

1 L•Y 4

if we deﬁne Y = yy T . Observe that Y � 0, Y has all 1’s on its diagonal, and its rank is equal to 1. It is easy to see that the coverse is also true: if Y � 0, rank(Y ) = 1 and Yii = 1 for all i then Y = yy T where y ∈ {−1, 1}n . Thus we can reformulate the problem as: 1 L•Y 4 rank(Y ) = 1, ∀i ∈ V : Yii = 1, Y � 0.

max s.t.

This is almost a semideﬁnite program except that the rank condition is not allowed. By removing the condition that rank(Y ) = 1, we relax the problem to a semideﬁnite program, and we get the following SDP: SDP = max s.t.

1 L•Y 4 ∀i ∈ V : Yii = 1, Y � 0.

Obviously, by removing the condition that rank(Y ) = 1 we only increase the space on which we maximize, and therefore the value (simply denoted by SDP ) to this semideﬁnite program is an upper bound on the solution to the maximum cut problem. We can use the algorithms we described earlier in the class to solve this semideﬁnite program to an arbitrary precision. Either the ellipsoid algorithm, or the interior-point algorithms for conic programming. Remember that semideﬁnite programs were better behaved if they satisﬁed a regular ity condition (e.g., they would satisfy strong duality). Our semideﬁnite programming relaxation of MAXCUT is particularly simple and indeed satisﬁes both the primal and dual regularity conditions: (a) Primal regularity conditions ∃Y � 0 s.t. Yii = 1 ∀i. This condition is obviously satisﬁed (consider Y = I).

Lec18-5

(b) Dual regularity condition: First consider the dual problem min

s.t.

1� zi 4 i∈V ⎛ z1 0 ⎜ 0 z2 ⎜ ⎜ .. .. ⎝ . .

... ... .. .

0 0 .. .

0

...

zn

0

⎞ ⎟ ⎟ ⎟ − L � 0, ⎠

where zi ∈ R for all ⎛ ⎞ i ∈ V . The regulation condition is that there exist zi ’s such that z1 0 ... 0 ⎜ 0 z2 ... 0 ⎟ ⎟ ⎜ ⎜ .. .. . . .. ⎟ − L � 0. This is for example satisﬁed if, for all i, zi > λmax (L). ⎝ . . . ⎠ . 0 0 ... zn Remark: If we add the condition that z1 = z2 = ... = zn to the dual then the smallest value zi can take is equal to λmax (L), and we derive that: OP T ≤ SDP ≤

n λmax (L), 4

and therefore this SDP bound improves upon the eigenvalue bound. We will start the next lecture by proving the following theorem. Theorem 3 ([2]) For all w ≥ 0, we have that

OP T SDP

≥ 0.87856.

In order to prove this theorem, we will propose an algorithm which derives a cut from the solution to the semideﬁnite program. To describe this algorithm, we ﬁrst need some preliminaries. From the Cholesky’s decomposition, we know that: Y �0 ⇔ ⇔

∃V ∈ Rk×n , k = rank(Y ) ≤ n, s.t. Y = V T V ∃v1 , ..., vn s.t. Yij = viT vj , vi ∈ Rn .

Therefore, we can rewrite the SDP as a ’vector program’: max

�

w(i, j)

(i,j)∈E

s.t.

1 − viT vj 2

∀i ∈ V : �vi � = 1 ∀i ∈ V : vi ∈ Rn .

To be continued...

References [1] A. Haken and M. Luby, “Steepest descent can take exponential time for symmetric connection networks”, Complex Systems, 1988. [2] M.X. Goemans and D.P. Williamson, Improved Approximation Algorithms for Maximum Cut and Satisﬁability Problems Using Semideﬁnite Programming, J. ACM, 42, 1115–1145, 1995. [3] D.S. Johnson, C.H. Papadimitriou and M. Yannakakis, “How easy is local search”, Journal of Computer and System Sciences, 37, 79–100, 1988. Lec18-6

[4] S. Poljak, “Integer Linear Programs and Local Search for Max-Cut”, SIAM J. on Computing, 24, 1995, pp. 822-839. [5] A.A. Sch¨ aﬀer and M. Yannakakis, “Simple local search problems that are hard to solve”, SIAM Journal on Computing, 20, 56–87, 1991.

Lec18-7

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

November 21, 2008

Lecture 19 Lecturer: Michel X. Goemans

1

Introduction

In this lecture, we revisit MAXCUT and describe a randomized γ (≈ .87856)-approximation al gorithm. We also explore SPARSEST-CUT, an NP-hard problem for which no constant factor approximation is known. We begin to describe an O(log k) approximation using multicommodity ﬂows; here k is the number of commodities. To deﬁne the relationship between the optimal values of SPARSEST-CUT and multicommodity ﬂow, we introduce metrics and ﬁnite metric spaces.

2

Revisiting MAXCUT

Recall the MAXCUT problem: given a graph G = (V, E) and weights w : E → R+ (we could assume that� G is the complete graph and weights are 0 for the original non-edges), maximize w(S : S¯) (= wij ) in S ⊂ V . MAXCUT can be formulated as the integer program i∈S

¯

j∈S

max

� (i,j)∈E

wij (1 − xi xj )/2

subject to xi ∈ {±1}, ∀i. The prior lecture described a 1/2-approximation algorithm and an upper bound on the solution to the above optimization, via reduction to a semideﬁnite program.

2.1

SDP Relaxation of MAXCUT

In the SDP relaxation, we replaced the xi with unit vectors in the sphere S n−1 := {x ∈ Rn : kxk = 1}. Thus, the goal of the relaxed MAXCUT was to ﬁnd � max wij (1 − viT vj )/2 (i,j)∈E

subject to vi ∈ S n−1 , ∀i. Though it is not immediately clear that this represents a semideﬁnite program, it can be reformulated as follows: � max wij (1 − Yij )/2, (i,j)

subject to Yii = 1, ∀i Y 0.

19-1

Figure 1: For the 5-cycle, the optimum vectors end up being in a lower-dimensional space (of dimension 2), see left ﬁgure. The angle between any two consecutive vectors is 4π/5 and total SDP value is 5(1 − cos(4π/5))/2 = 4.52 · · · . Taking a random hyperplane through the origin gives the cut (S : S¯), see the right ﬁgure. Given a solution to the SDP in the form of unit vectors vi , we would like to ﬁnd a feasible S giving as large a cut as possible. The ideal is to have vertices i and j separated by the cut when (1 − viT vj )/2 is large, i.e., vi and vj are far apart on the sphere. Here is a way to do this. Choosing a hyperplane through the origin divides the vectors into two groups, and we let S be the intersection of one halfspace with the set of vectors. The sets of vectors on each side of the hyperplane correspond to S and S¯. As an example, we illustrate the vectors for a cycle of length 5 in Figure 1. Which hyperplane should we choose? Well, the optimum vectors are deﬁnitely not unique; any rotation of them (orthonormal transformation) will also provide an optimum solution since the objective function depends only on the inner products (viT vj ). Therefore we should not have a preferred direction for the hyperplane.

3

MAXCUT γ-Approximation Algorithm

This discussion provides the intuition behind the following randomized algorithm, due to Goemans and Williamson ([1]): 1. Choose a unit vector r ∈ S n−1 uniformly. 2. Let S = {i ∈ V : rT vi ≥ 0}. Remark 1 In the case n = 2, it is easy to pick a uniform r, by taking θ ∈ [0, 2π) uniformly, whence r = (cos θ, sin θ)T . For a general n, we should ﬁnd r ∈ S n−1 by selecting each component independently from a Gaussian distribution, and then normalize to krk = 1. Theorem 1 The Goemans-Williamson algorithm is a randomized γ-approximation algorithm for 2 cos−1 x (≈ .87856). MAXCUT, where γ = min −1≤x≤1 π(1 − x) Proof: “OPT” and “SDP” will denote the optimal solution to the MAXCUT instance and its SDP relaxation. We show E[w(S : S¯)] ≥ γ · SDP ≥ γ · OPT. 19-2

By linearity of expectations, we have: � ¯ 0 otherwise}] ¯ wij {1 if (i, j) ∈ (S : S); E[w(S : S)] = E[ (i,j)

=

� (i,j)

¯ wij P r[(i, j) ∈ (S : S)].

If we were in dimension 2 then vi and vj are separated by the line orthogonal to r if and only if this line falls between vi and vj and this occurs with a probability ∠(vi , vj )/π (where ∠(vi , vj ) denotes the angle between vi and vj ). The same is also true for higher dimensions. Indeed, let p denote the projection of r onto the 2-dimensional space F spanned by vi and vj . We have rT vi = pT vi rT vj = pT vj implying that vi and vj are separated for the partition deﬁned by r if and only if they are separated for the partition deﬁned by p. But p/||p|| is uniform over the unit circle in F . Therefore, P r[(i, j) ∈ (S : S¯)] = ∠(vi , vj )/π

and, using the fact that vi and vj are unit vectors (and thus viT vj = cos ∠(vi , vj )): P r[(i, j) ∈ (S : S¯)] = cos−1 (viT vj )π. So, we get a closed-form formula for the expected weight of the cut produced: � E[w(S : S¯)] = wij cos−1 (viT vj )/π. (i,j)

On the other hand, we know that �

SDP =

(i,j)

wij (1 − viT vj )/2.

Since wij is non-negative, E[w(S : S¯)]/SDP ≥ the smallest ratio over all (vi , vj ): E[w(S : S¯)]/SDP ≥

min (cos−1 (x)/π)/[(1 − x)/2]

−1≤x≤1

=: γ(≈ 0.87856). Several remarks are in order. Remark 2 The analysis is tight in the sense that, for any ε > 0, there exist instances such that OPT/SDP ≤ γ + ε.[2] Remark 3 It is possible to derandomize Goemans-Williamson (and achieve a performance guaran tee of γ); still, in practice, the fact that one can output many cuts is useful as one can then exploit the variance of the weight of the cut. Remark 4 No approximation algorithm achieving better than γ is currently known. Remark 5 Approximating MAXCUT within 16/17 (≈ .94117) +ε for any ε > 0 is NP-hard[3]. Approximating MAXCUT within γ + ε for any ε > 0 is UGC-hard; that is, an eﬃcient algorithm doing such would imply the falsity of the Unique Games Conjecture. Remark 6 It can be shown that the SDP relaxation above always has an optimal solution in dimen √ sion r where r(r2+1 ≤ n (i.e. r ≤ 2 n). 19-3

4

SPARSEST-CUT and Multicommodity-Cut

We now consider the problem of identifying a sparse cut in a graph: one which is as small as possible, relative to the number of edges which could exist between the sets of vertices. The latter quantity is maximized by balancing the vertices across the partition. Hence, we seek S ⊂ V minimizing w(S : S¯)/|S × S¯|. A generalization of SPARSEST-CUT is the multicommodity cut problem, in which we have, in addition to a capacitated G = (V, E), some k commodities, each associated with a “demand” fi and a source and sink si , ti ∈ V . (The idea is that we want to ship fi units of commodity i from si to ti .) We seek the value of a cut (S : S¯) with minimum capacity relative to the demand across it, i.e., u(S : S¯) min � . ¯ S:S [ ¯) fi ] i:(si ,ti )∈(S:S We will write β for the objective in this expression, and denote its optimum by β ∗ . We recover SPARSEST-CUT by taking u = w and creating a commodity of demand 1 for each pair of vertices. As another special case, when k = 1, we are minimizing u(S : S¯) over cuts separating s and t, so we have the min s–t cut problem (in an undirected graph).

4.1

Concurrent multicommodity ﬂow

Let us now discuss a problem which is in a sense dual to the multicommodity cut. In concurrent multicommodity ﬂow, we are given G = (V, E) with k commodities and capacity constraints on each edge ∈ E, and seek the maximum α such that we can send αfi units of ﬂow across the graph from si to ti for all i simultaneously, without violating the capacity constraints on each edge. Let α∗ denote the optimal value. It is easy to see how to do multicommodity ﬂow by linear programming. The multicommodity cut and ﬂow problems are related by α∗ ≤ β ∗ . Indeed, if we can send αfi from si to ti for all i, u(S : S¯) must be at least αfi for each (si , ti ) in the cut, so u(S : S¯) ≥α β= � [ i:(si ,ti )∈(S:S¯) fi ] for all feasible β and α. This is a “weak duality”-type condition. If k = 1, we have equality, by the max s − t ﬂow min s − t cut theorem (one can show that the theorem for directed graphs implies it also for undirected graphs). It is non-obvious that we have α∗ = β ∗ for k = 2 as well. In general, however, we do not have equality. In ﬁgure 2, we show an example of a graph with a relatively small number of commodities (4) for which α∗ is strictly less than β ∗ . In this graph, all capacities have value = 1. For this graph, β ∗ = 1. Consider the multicommodity cut given by the dashed line. For this cut, and any similar cuts, the sum of the � capacities across the cut is u(S : S¯) = 3 and the amount of demand that needs to go through it is i:(si ,ti )∈(S:S¯) fi = 3 also. If we choose a cut for which the capacties sum to 2 instead, the sum of the demands will also be 2. Therefore, β ∗ = 1. What is α∗ though? There are k = 4 commodities in this graph, and yet a maximum of 3 units of ﬂow can be pushed across a cut at one time. Since s2 and t2 are on the same side of the cut, you might think that α∗ might be able to reach 1. However, since each si is at least two edges away from its ti and there are 4 commodities, if α∗ = 1 then the sum of the ﬂow on all the edges of the graph would have to be (4)(2)(1) = 8. Yet there only 6 edges, each with capacity 1. This shows that α∗ ≤ 3/4. So what IS the relationship between α∗ and β ∗ in general? Theorem 2

β∗ = O(log k). α∗ 19-4

Figure 2: An Example Graph where α∗ < β ∗ . Remark 7 Computing β ∗ is NP-hard. However—as we will see in the upcoming √ lecture — we can get a O(log k) approximation using the LP we have for α∗ , and a tighter O( log k) approximation using an SDP. To prove the above result, we introduce metric spaces.

5

Finite Metric Spaces

Deﬁnition 1 Let X be an arbitrary set, and d a function X × X → R. (X, d) is a metric space if the following properties hold for all x, y, z ∈ X: 1. d(x, y) ≥ 0 (Nonnegativity) 2. d(x, y) = d(y, x) (Reﬂexivity) 3. d(x, y) + d(y, z) ≥ d(x, z) (Triangle Inequality) For simplicity, we will deal only with ﬁnite metric spaces (i.e. |X | is ﬁnite). Deﬁnition 2 Let X, Y be sets with associated metrics d, ℓ. For c ≥ 1, we say that (X, d) embeds into (Y, ℓ) with distortion c if there is a mapping φ : X → Y such that for any x, y ∈ X, d(x, y) ≤ ℓ(φ(x), φ(y)) ≤ cd(x, y). If c = 1, the embedding is called isometric. This distortion measure is useful when we can transform a problem deﬁned on one metric into another metric that is easier to deal with. This is precisely what we will do in the context of multicommodity cuts and ﬂows. The most familiar metric spaces are n-dimensional Euclidean spaces, where d(x, y) := kx − yk2 = �� 2 gives the family of ℓpn spaces, where we work over the set Rn and i (xi − yi ) . Generalizing � d(x, y) := kx − ykp = ( i |xi − yi |p )1/p . One can show that in the limit as p → ∞, this expression tends to maxi |xi − yi |. This space is denoted ℓn∞ . Suppose (X, d) is isometrically embeddable into ℓ1 (that is, ℓn1 for some n). Is d isometrically embeddable into ℓ2 as well? Not necessarily. Here we claim that ℓ2 -embeddable metrics are only a subset of ℓ1 -embeddable metrics, which in turn are a subset of ℓ∞ metrics. In fact, we put forth the following lemma: 19-5

|V |

Lemma 3 Any ﬁnite metric space (V, d) is isometrically embeddable in ℓ∞ .

Proof:

For notational purposes, let V = {1, 2, . . . , n}. The mapping φ : V → R|V | is given by

φ(v) = (d(1, v), d(2, v), . . . , d(n, v)).

Using properties of metrics, we have d(u, v) =

= |d(u, u) − d(u, v)|

≤ max |d(i, u) − d(i, v)| i∈V

= kφ(u) − φ(v)k∞ = ℓ∞ (φ(u), φ(v)). On the other hand, the triangle inequality gives (φ(u) − φ(v))i = d(i, u) − d(i, v) ≤ d(u, v)

(φ(v) − φ(u))i = d(i, v) − d(i, u) ≤ d(u, v) for all i, so ℓ∞ (φ(u), φ(v)) = maxi∈V |(φ(u) − φ(v))i | ≤ d(u, v).

Remark 8 The ℓ2 -embeddable ﬁnite metrics are ℓ1 -embeddable. The proof for this will be revisited in the next lecture. For now we return to the Multicommodity-Cut problem, and how metrics can help us get an approximation algorithm for it.

6

Back to multicommodity cut

In the notation of metric spaces, we have the following. (“M ≤ M ′ ” means “M is isometrically embeddable in M ′ ”) Theorem 4 ∗

α = ∗

β =

min

�

ℓ : (V,ℓ)≤ℓ∞

min

ℓ : (V,ℓ)≤ℓ1

�

e=(i,j)∈E u(e)ℓ(i, j) �k i=1 fi ℓ(si , ti )

e=(i,j)∈E u(e)ℓ(i, j) �k i=1 fi ℓ(si , ti )

(Note that the only diﬀerence between these two expressions is the class of metrics in which we permit (V, ℓ) to reside. Thus, since α∗ minimizes over a larger space, we have α∗ ≤ β ∗ immediately—as we expect.) In the following lecture, we show an algorithm to compute β ∗ approximately, making use of the above.

References [1] M.X. Goemans and D.P. Williamson, Improved Approximation Algorithms for Maximum Cut and Satisﬁability Problems Using Semideﬁnite Programming, J. ACM, 42, 1115–1145, 1995. [2] U. Feige and G. Schechtman, On the optimality of the random hyperplane rounding technique for MAX CUT, Algorithms, 2000. [3] J. H˚ astad, Some optimal inapproximability results, J. ACM, 48, 798–869, 2001.

19-6

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

December 1, 2008

Lecture 22 Lecturer: Michel X. Goemans In this lecture, we introduce Seidel’s algorithm [3] to solve linear programs with n constraints in dimension d, when the dimension is small. The expected running time of Seidel’s algorithm is O(d!n), i.e. it is strongly polynomial for ﬁxed dimension d (strongly, since it does not depend on the size of the input coeﬃcients). Then, we use Seidel’s algorithm to develop a randomized convex-hull algorithm in an arbitrary dimension d which is the best possible when d ≥ 4.

1

Linear Programming in Fixed Dimension

In this section, we ﬁx the dimension d. We wish to ﬁnd a strongly-polynomial time algorithm to solve linear programming.

1.1

Seidel’s Algorithm

Let H be a set of n inequalities. Each inequality corresponds to a half-space h determined by a hyperplane. Let LP (H) be the linear program that minimizes cT x subject to the constraints: \ x∈ h, x ∈ Rd . h∈H

To make the description of the algorithm simpler, we make the following two assumptions:

1. Bounded: the feasible region is bounded, i.e. there exists M such that, for any feasible x, −M ≤ xi ≤ M for all i = 1, 2, . . . , d.

This assumption can be enforced by ﬁcticiously imposing a large bounding box, and whenever one of the inequalities of this bounding box is tight at optimum, we know that the linear program is unbounded.

2. Non-degenerate: the intersection of any d + 1 hyperplanes is empty. In 2-D, non-degeneracy means that there do not exist three lines meeting at the same point. If H does not meet this assumption, we can use some standard tricks like perturbation to make it non-degenerate. This can be handled by doing so-called lexicographic perturbation. These two assumptions imply that for any H ′ ⊆ H, LP (H ′ ) has a unique solution x(H ′ ). Seidel’s algorithm will actually apply to a more general class of problems than linear programming, but here we’ll focus on linear programming. What is actually needed in the generalization is that the unique solution x(H ′ ) is deﬁned by a basis: Deﬁnition 1 A subset B ⊆ H ′ is called a basis of the linear program LP (H ′ ) if x(B) = x(H ′ ) and B is minimal. Seidel’s algorithm solves the linear program H incrementally as follows. Chooes uniformly h ∈ H. Solve the linear program with h removed, and get a solution x. If the solution x satisﬁes h, then return x. If the solution x does not satisfy h, we impose the condition that h is satisﬁed at equality, and eliminate one variable. Then solve the linear program with d − 1 variables and n − 1 inequalities. The correctness of this algorithm was proved in the last lecture. In Seidel’s algorithm, we can stop the recursion when we have either n constraints in d = 1 variable (which takes O(n) time to solve), or 1 constraint in d variables (which takes O(d) time to optimize over our ﬁcticious bounding box). 22-1

1.2

Analysis of Running Time

Let T (d, n) be the expected running time of Seidel’s algorithm on an instance with n inequalities and d variables. To ﬁnd a recursive relation for T (d, n), note that we ﬁrst recursively solve an LP with n − 1 inequalities and d variables, which takes time T (d, n − 1). If the solution x satisﬁes the removed constraint h (which takes O(d) time to check), we are done and simply return the d coordinates of x. If x does not satisfy h, we ﬁrst reduce the LP to only d − 1 variables in O(dn) time (it takes O(d) time to eliminate one variable in each constraint) using the constraint h, and then solve the LP with n − 1 inequalities and d − 1 variables in T (d − 1, n − 1) time. The probability that x does not satisfy h is d/n, since the optimal solution is determined by exactly d inequalities and we have selected an inequality uniformly at random. This is the important step in the analysis and is known as backward analysis. By the analysis above, we have T (d, n) = T (d, n − 1) + O(d) + = T (d, n − 1) +

d (O(dn) + T (d − 1, n − 1)) n

d T (d − 1, n − 1) + O(d2 ). n

The base cases are T (1, n) = O(n) and T (d, 1) = O(d). Using this recursive relation, we can prove by induction on d + n that Claim 1

Proof:

X i2 d!n = O(d!n). T (d, n) = O i! 1≤i≤d

The base case is satisﬁed. We need to check the induction step. Suppose that X i2 d!(n − 1) , T (d, n − 1) = O i! 1≤i≤d

T (d − 1, n − 1) = O

Since

X i2 d · d!(n − 1) + n i!

1≤i≤d

X

1≤i≤d−1

X

1≤i≤d−1

2

i (d − 1)!(n − 1) . i!

X i2 i2 d!n, · (d − 1)!(n − 1) + d2 ≤ i! i! 1≤i≤d

the claim also holds for T (d, n). P∞ 2 The second equality in the claim follows from the fact that i=1 ii! is ﬁnite. � Thus, we have shown a strongly polynomial time algorithm to solve linear programs in a ﬁxed small dimension d.

1.3

Improvement (Matousek, Sharir, Welzl [2])

Although the expected running time of Seidel’s algorithm is strongly-polynomial in n, it increases exponentially when d increases (more precisely, the dependence on d is 2O(d log d) ). In this subsection, we brieﬂy introduce an improvement to Seidel’s algorithm which gives a subexponential bound in d. We consider the linear program as follows. The LP algorithm LP (H, C) takes as input a candidate set C (that plays the role of a basis), and returns x as well as a basis B. Initially, we call LP (H, C) with C = ∅. 22-2

The algorithm proceeds as follows. If H = C, then return C. If H 6= C, choose h randomly among H − C. We recursively call LP (H − {h}, C) and get a basis B. If h is satisﬁed by the solution deﬁned by B, then return S B. Otherwise, we call LP (H, basis(B, h)), where basis(B, h) denotes an optimal basis for LP (B {h}). Claim 2 The expected running time is √ √ √ O e2 d log(n/ d)+O( d)+O(log n) .

When d is ﬁxed, the running time is a polynomial of n. When n is ﬁxed, the running time is √ O(e d ), subexponential in d. Use a trick due to Clarkson (through random sampling), one can show that linear programs with n inequalities in d dimensions can be solved in O(d2 n + e

√ d log d

)

time. This is the best bound currently known that is independent on the size of entries. See Goldwasser [1] for a discussion.

2

Convex Hull

Given n points x1 , . . . , xn ∈ Rd . Let P be the convex hull of x1 , . . . , xn . For d = 2 and d = 3, P can be found in O(n log n) time. In the previous lecture, we showed several algorithms that solve 2-dimensional convex hull in O(n log n) time. Throughout this section, we assume that the points x1 , . . . , xn are in general position, meaning that any d+1 points do not lie on the same hyperplane. If that’s not the case, a standard perturbation argument can be used.

2.1

Outputs of Convex Hull Algorithms

In dimension 2, it is suﬃcient to output the vertices of the convex hull in counterclockwise order. In this subsection, we introduce what the output is for a general d. Deﬁnition 2 For any 0 ≤ k < d, a k-face of a d-dimensional polytope P is a face of P with dimension k. A (d − 1)-face is called a facet. A (d − 2)-face is called a ridge. A 1-face is called an edge. A 0-face is called a vertex. Deﬁnition 3 A simplicial polytope is a polytope where every face is a simplex. Since the points x1 , . . . , xn are in general position, the convex hull P is a simplicial polytope. The convex hull algorithm outputs a facet graph F (P ). The vertices of F (P ) are all facets of P . The edges of F (P ) correspond to the ridges of P , connecting two facets shared by the ridge (Figure 1). For general d, one can show that the number of facets of P is O(n⌊d/2⌋ ). Since the convex hull algorithm needs to output all the facets of P , the running time of any such algorithm is at least Ω(n⌊d/2⌋ ).

2.2

Convex Hull Algorithms

Clarkson and Shor ’89 developed a randomized algorithm to compute convex hull in O(n log n + n⌊d/2⌋ ) expected time. Chazelle ’93 developed a deterministic algorithm in O(n log n + n⌊d/2⌋ ) time. These algorithms are optimal by the analysis in the previous subsection. 22-3

Figure 1: The ﬁgure on the left is part of a 3-dimensional simplicial polytope with four vertices labeled x1 , x2 , x3 , x4 . On the right is the corresponding facet graph, where the faces x1 x2 x3 , x2 x3 x4 , and the edge x2 x3 are labeled. We will illustrate Seidel’s algorithm [3], which has running time O(n2 + n⌊d/2⌋ ). For d = 2 and d = 3, Seidel’s algorithm takes time O(n2 ), which is not optimal. But for larger d, Seidel’s algorithm is optimal, and is considerably simpler. We take a random permutation x1 , x2 , . . . , xn of the points. Let Pi be the convex hull of the points x1 , . . . , xi . Initially Pd+1 = conv(x1 , . . . , xd+1 ) is a d-dimensional simplex. F (Pd+1 ) is the complete graph on d + 1 points. We incrementally compute Pd+2 , . . . , Pn . To do this, we need the following deﬁnitions. Deﬁnition 4 A facet F of a polytope P is visible from a point xi if the supporting hyperplane of F seperates xi from P . Otherwise, F is called obscured. Deﬁnition 5 A ridge of a polytope P is called visible from a point xi if both facets it connects are visible, and obscured if both facets are obscured. A ridge is called a horizon ridge if one of the facets it connects is visible and the other is obscured. To compute the convex hull Pi when adding a new point xi , Seidel’s algorithm performs the following four steps. Step 1 Find one visible facet F if one exists. If there is no visible facet, we are done. This step can be done using linear programming in O(d!i) time. Indeed we would like to ﬁnd a hyperplane aT x ≤ b (where the unknowns are a ∈ Rd and b) such that aT xi = b and aT xi ≤ b for j = 1, · · · , i − 1. Any extreme solution will correspond to a new facet and to a horizon ridge. One of the two facets indicident to this horizon ridge is visible. Step 2 Find all visible facets. Determine all horizon ridges. Delete all visible facets and all visible ridges.

This can be done by depth-ﬁrst-search (DFS), since the visible facets and invisible facets are

seperated by horizon ridges. In terms of running time, we charge the deletion time of the facets

to when the facets were created.

Step 3 Construct all new facets. Each horizon ridge corresponds to a new facet containing the point xi and the ridge (Figure 4). Step 4 Each new facet contains d ridges. Generate all these new ridges. Every new ridge R is a sequence of d − 1 points a1 < a2 < . . . < ad−1 . Then match corresponding ridges using radix sort to construct the facet graph. 22-4

Figure 2: In 3-D, ridges are just edges.

Figure 3: The visible ridges and the invisible ridges are seperated by horizon ridges.

22-5

Figure 4: In the ﬁgure on the top, the shaded regions are visible facets. In the ﬁgure on the bottom, visible facets are removed and new facets are added.

22-6

The expected running time of Seidel’s algorithm to compute the convex hull is O(n2 + n⌊d/2⌋ ). Indeed the running time is ! n X O (i + Ni ) , i=d+2

where Ni is the number of facets created at step i. One has that i−1 E[Ni ] = E[facets of Pi containing xi ] ≤

giving the required time bound.

d −1 i d

O(i⌊d/2⌋ ) =

d O(i⌊d/2⌋ ), i

References [1] M. Goldwasser, “A survey of linear programming in randomized subexponential time”, ACM SIGACT News, 26, 96–104, 1995. [2] J. Matousek, M. Sharir, and E. Welzl, “A subexponential bound for linear programming”, Algorithmica, 16, 498–516, 1996. [3] R. Seidel, “Small-dimensional linear programming and convex hulls made easy”, Discrete & Computational Geometry, 6, 423–434, 1991.

22-7

MIT OpenCourseWare http://ocw.mit.edu

6.854J / 18.415J Advanced Algorithms Fall 2008 ��

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

18.415/6.854 Advanced Algorithms

December 3, 2008

Lecture 23 Lecturer: Michel X. Goemans

1 1.1

Voronoi Diagrams Introduction

Suppose we are given a set P of points in the Euclidean plane, and we are interested in the problem of, given a point x, ﬁnd the closest point of P to x. One approach to this problem is to divide the plane into regions associated with each pi ∈ P for which x is closest to pi . Finding these regions in two dimensions is the problem of constructing the Voronoi Diagram. One application of this structure is to compute the mimumum spanning tree of a complete graph of n vertices in the Euclidean plane in time O(n log n).

1.2

Deﬁnitions

We will focus on the two-dimensional case. We are given a set P = {p1 , p2 , . . . , pn } ⊆ R2 and we want to partition the plane into regions which correspond to points which are closest to a speciﬁc point.

Figure 1: Voronoi Diagram (solid lines) for four points p1 , p2 , p3 , p4 .

23-1

Deﬁnition 1 (Voronoi Cell) Given a set of points in R2 , P = {p1 , p2 , . . . , pn } ⊆ R2 , a Voronoi Cell V (pi ) is deﬁned by: V (pi ) = {x : d(pi , x) < d(pj , x) ∀j = � i}. Another way to deﬁne a Voronoi Cell is by deﬁning h(pi , pj ) to be the halfplane containing pi deﬁned by the bisector of pi and pj . A cell is then deﬁned as: � V (pi ) = h(pi , pj ). j=i �

This impies that every cell is convex and is a (convex) polygonal region with at most n − 1 sides. Deﬁnition 2 (Voronoi Diagram) A Voronoi Diagram is a collection of Voronoi cells that covers R2 .

1.3

Motivation

Why is a Voronoi Diagram useful? If the points represent ﬁrestations, the Voronoi cells represent the partition of the plane into regiosn which are closer to each ﬁrestation. More generally, given a point in a plane, it is useful to know the point from a set of points that is closest to it. Of course, this also requires a data structure to be able to answer the point location problem of, given x, ﬁnding the Voronoi cell that contains it. We will only learn how to construct the Voronoi diagram, not how to build a query datastructure for it. . Having such a diagram is useful for many problems. For example, a Voronoi diagram allows computation of the Euclidian minimum spanning tree on a set of points in O(n log n) time, see the problem set.

1.4

Properties

The Voronoi cells are all disjoint and their closures cover the entire plane. The Voronoi diagram will consist of edges (possibly semi-inﬁnite, extending to inﬁnity) and vertices where 3 or more of these edges meet; these vertices will be equidistant to 3 or more points of P . One can characterize the vertices and the edges in the following way: Lemma 1 1. A point q ∈ R2 is a vertex of a Voronoi Diagram ⇐⇒ there exists an empty circle (i.e. its interior is empty) centered at q having at least 3 points of P on its boundary. 2. Part of the bisector between pi and pj is an edge of the Voronoi diagram ⇐⇒ there exists an empty circle centered at a point q having precisely pi and pj (and no other point) on its boundary. 23-2

We look now at how ’complex’ a Voronoi diagram can be. We know that each cell is delimited by at most n − 1 sides (edges), but in the lemma below, we show that collectively all cells do not have too many edges and vertices. Lemma 2 For a Voronoi diagram with n points, the following relations hold: • The number of vertices of a Voronoi diagram is nv ≤ 2n − 5. • The number of edges in any Voronoi diagram is ne ≤ 3n − 6.

Figure 2: To prove Lemmma 2 we add a point q∞ to the Voronoi Diagram (solid lines), and connect all of the inﬁnite edges to this point (shown in dotted lines). Proof: We can view the Voronoi diagram as a planar graph, G, with some edges extending out to inﬁnity. We add a point at inﬁnity q∞ representing ‘inﬁnity’ and connect edges that extend to inﬁnity to this point as shown in Figure 2. Note that the resulting graph G� is still planar. The number of vertices in G� is nv + 1; the number of edges is ne , and the number of faces is n. By Euler’s formula, we have nv + 1 − ne + n = 2. Since we know that vertices will have at least 3 edges incident to them, we obtain, by summing the degrees over all vertices, that: � d(v) = 2ne ≥ 3(nv + 1). vertices v 23-3

Combining this with Euler’s formula, we get: 2(nv + 1) + 2n ≥ 4 + 3(nv + 1) or 2n − 5 ≥ nv . Using this in Euler’s formula, we now get ne = nv − 1 + n ≤ 3n − 6. �

2 2.1

Computation of Voronoi Diagrams Introduction

There are two primary algorithms we want to introduce. Both of these will be shown to compute the Voronoi diagram in time O(n log n). First, we can reduce the com putation of the Voronoi diagram to that of a convex hull in R3 , which is computable in time O(n log n); this is our ﬁrst algorithm. Secondly, we will review the sweep line algorithm of Fortune [1].

2.2

Convex Hull

Figure 3: Projection of a point onto a paraboloid in R3 . To use the convex hull to compute the Voronoi diagram, this projection is done for all points in the set of points for which we want to compute the Voronoi diagram. Suppose we have a set P ⊆ R2 and we want to compute the corresponding Voronoi diagram. Let us consider the set P � = {(xi , yi , x2i + yi2 ) : (xi , yi ) ∈ P }. This projection onto a parabola is shown in Figure 3. 23-4

Consider the set of planes tangent to each point in P � . The intersection of the upper half spaces of these planes gives a polyhedral set Q whose projection back to R2 gives the Voronoi diagram in the following sense: the projection of the facets (resp. edges, vertices) of Q gives the Vornoi cells (resp. edges, vertices) of the Voronoi diagram. This computation can be done in O(n log n) time since this calculation is the geometric dual of the convex hull computation. If, instead, we were to compute the convex hull of P � (rather than the halfs paces tangent to the paraboloid at P � ) and project it back to R2 , we would obtain a straight-line drawing on P (dual to the Voronoi diagram) known as the Delaunay Triangulation, see problem set.

2.3

Sweep Line Algorithm

The idea of a sweep line algorithm is to advance a line (in 2D) or a plane (in 3D) down through space, processing events as they occur. We will construct the Voronoi diagram as we sweep the line from top to bottom, and at any instance we will only have needed to consider points at or above the sweep line. We cannot construct the entire diagram above the sweep line, but we can construct pieces of it. If we look at a single point above the line, pi , for some points, they will assuredly be closer to it than to any points below the sweep line. This forms a parabola C(pi ) deﬁned by the points equidistant from the point and the sweep line. We can ﬁnd the parabola associated with each of the points. For any point that is above some parabola, we can correctly assign it to its Voronoi Cell.

Figure 4: A set of parabolae C(pi ) associated with four points pi . Parabolae are denoted with thin lines, the beach line with a thick line, and the associated sweep line with a thick dashed line. Deﬁnition 3 (Beach line) We deﬁne a Beach Line as the lower envelope of all parabolae C(pi ) for all points above the sweep line. A beach line is shown in Figure 4. 23-5

Deﬁnition 4 (Breakpoint) A breakpoint q is a point on the beach line that belongs to at least two parabolae.

Figure 5: Sample beach line illustrating multiple break points originating from the same parabola The beach line is a series of segments of parabolae. A breakpoint q corresponding to the parabolas C(pi ) ad C(pq ) must be equidistant from both pi and pj since we know that d(q, pi ) = d(q, sweep) = d(q, pj ). Furthermore, no other point of P is closer to q. Thus, by Lemma 1, q is part of an edge of the Voronoi diagram, and is part of the bisector between pi , pj . An example is shown in Figure 6. We will keep track of which pi the breakpoints are associated with in order. Note a beach line could have several segments from the same parabola, as illustrated in Figure 5. 2.3.1

Events

As we sweep the line, we are not going to keep track of the precise location of the beach line (as it constantly changes) but we will just keep track of the points pi corresponding to the parabola segments of the beach line from left to right. Several events can happen that modify this sequence of points pi . 1. A ‘Site Event’ occurs when the sweep line goes through a new point pl . This results in additition of an arbitrarily narrow parabola around pl to the beach line. A sample site event is shown in Figure 7. If pl intersects the parabola associated with pj , we could write the change in the sequence of points as: pi pj pk → pi pj pl pj pk Note we insert exactly one segment per site event, so there are n in total. Notice that each such addition increases the number of segments by 2, as shown above. 23-6

Figure 6: Illustration of points q on an edge of a Voronoi diagram as constructed by a moving sweep line. We’ll see that this is the only way of creating a new segment in the beach line, so this implies that the total number of segments in the beach line is at most 2n − 1 (1 segment for the ﬁrst site event, and 2 more for each subsequent site event). 2. A ‘Circle Event’ occurs when lowering the beach line causes a segment to disap pear from the beach line. This boundary case is illustrated in Figure 8, which can be compared to Figure 6 to show the eﬀfect of a moving sweep line. When a segment disappears, we have discovered a new vertex in the Voronoi diagram. Indeed, when a circle event occurs, we must have the three closest points equidistant to the vertex, and thus we have a vertex by Lemma 1. The center of the circle is determined by p1 , p2 and p3 (corresponding to 3 consecutive segments on the beach line), and the circle event will happen when the sweep line is tangent to the circle (below it). When a circle event happens, the beach line is modiﬁed in the following way: p1 p2 p3 → p1 p3 . Claim 3 The only way for the beach line to change is through a site event or a circle event. In other words, these are the only ways to create and remove segments. We will not formally prove this – this is intuitive. 23-7

Figure 7: Site event. Parabolae shown with thin lines and the beach line shown as a thick line. 2.3.2

Data Structures

In order to construct a diagram, we will describe three data structures: 1. Event queue: Construct a priority queue containing events. The key of an event is its ycoordinate. For a site event the y-coordinate is the y-coordinate of the asso ciated point. For a circle event, this is the position of the sweep line which is (lower) tangent to the circle. We ﬁrst insert the n site events into the priority queue, as we know the y coordinate of all the points. Consider moving the line down and processing events as they occur. Circle events are deﬁned by looking at three consecutive segments of the beach line. Every time we introduce a new segment in the beach line, as happens in a site event, we potentially create two new circle events (potentially, since three consecutive segments create a circle event only if the 3 points are distinct). We may also need to delete some circle events. Let us consider the addition shown in Figure 7. We will have removed the potential circle event pi pj pk and added potential circle events pi pj pl and pj pl pk . Note that the deleted event can be thought of as a fake event because it was removed before it really happened and was processsed. Still such a circle event was added to the event queue and then removed. There is at most one deleted (fake) circle event for each site event processed. Notice that the number of real circle events is equal to the number of vertices of the Voronoi diagram, nv ≤ 2n − 5. Any circle event that is processed is real, and leads to a segment of the beach line disappearing. In terms of Figures 6 and 8, we would take p1 p2 p3 to p1 p3 . 23-8

Figure 8: Circle event. Parabolae shown with thin lines, Voronoi diagram with thick lines, and the sweep line with a thick dashed line. In general, we can write this as “Go from pi pj pk pl pm to pi pj pl pm ”. We may need to delete up to two circle events corresponding to the lost segment and add two new events, corresponding to the new order. In this example, we are deleting circle events pi pj pk and pk pl pm and adding pi pj pl and pj pl pm (or a subset of them if some of the indices are equal). We are always adding and deleting a constant number of events (for each site event and real circle event), thus the total number of additions and deletions to the priority queue will be linear. Since we must process O(n) events corresponding to O(n) priority queue operations, the total runtime will be O(n log n). 2. Beach line encoding: We keep track of the points corresponding to the parabola segments constituting the beach line and the breakpoints pi pj by creating a binary search tree in which points are leaves and internal nodes are breakpoints. Note that this is an extension of the standard binary search tree because we have two diﬀerent types of nodes (parabola segments and breakpoints). This prevents us from directly using a splay tree, since the splay action permutes the leaves and branches of the tree. One way to deal with this is to forget about parabola segments, and keep track of the breakpoints (as pairs of points), keyed from left to right. When a site event occurs, we need to be able to locate the x value in the beach line. To use a binary search tree, we thus need to be able to perform binary comparisons to determine if the desired x value is to the left or right 23-9

of a breakpoint. Given a breakpoint as an ordered pair (pi , pj ) and a sweep line, we can easily compute the x position of the breakpoint and decide if we must move to the right or to the left. In a circle event, we have three parabola segments and must remove the middle one. This is a delete operation. Thus there are a constant number of BST operations per circle or site event. Using a BST with amortized cost O(log n) time per operation, maintaining the beach line is therefore O(n log n) time. 3. Voronoi Diagram: Let us replace each edge (shared by 2 cells) of the Voronoi diagram with two corresponding directed half-edges which are ‘twin’ to each other. Each half edge corresponds to one of the two cells, and each is oriented counterclockwise (with respect to its cell). For each half-edge, we deﬁne pointers: • to its twin, • to the next half-edge on the cell, • to the previous half-edge on the cell. From a given vertex we can follow the half-edges around a cell; by calling twin, we can move between cells and we can for example enumerate all half-edges incident to a vertex. Let us consider how to modify this structure upon processing a site (Figure 7) and circle (Figure 8) events. In a site event, the two new breakpoints are equidistant from pj and pk , and are part of an edge of the Voronoi diagram. This will create two new half-edges. In a circle event we link the half edges that meet to construct the diagram. Thus there are a linear number of operations on this data structure as wll. In summary, the ﬁrst structure requires a linear number of operations each taking O(log n) time. Similarly, for the second data structure, with a balanced BST. The last one requires constant time per event, for a linear number of events. Hence the total time to construct a Voronoi diagram is O(n log n). We can show this is optimal because the Voronoi diagram of the set of points given by P = {(xi , ±1)} solves the problem of sorting P , hence the diagram must take at least O(n log n) time to sort. Note we use ±1 since we have assumed throughout this that we are not in the purely degenerate case in which all points are colinear; one can show that this is indeed the only case in which the Voronoi diagram has inﬁnite lines and no vertices.

23-10

References [1] S Fortune. A sweepline algorithm for voronoi diagrams. In SCG ’86: Proceedings of the second annual symposium on Computational geometry, pages 313–322, New York, NY, USA, 1986. ACM.

23-11