Metric k-center explained

In graph theory, the metric -center problem is a combinatorial optimization problem studied in theoretical computer science. Given cities with specified distances, one wants to build warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of vertices for which the largest distance of any point to its closest vertex in the -set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition

Let

(X,d)

be a metric space where

is a set and

is a metric
A set

V\subseteql{X}

, is provided together with a parameter

. The goal is to find a subset

l{C}\subseteqV

with

|l{C}|=k

such that the maximum distance of a point in

to the closest point in

l{C}

is minimized. The problem can be formally defined as follows:
For a metric space (

l{X}

,d),

Input: a set

V\subseteql{X}

, and a parameter

Output: a set

l{C}\subseteqV

points.

Goal: Minimize the cost

r^l{C}(V)=\underset{v\inV}{max}

d(v,

l{C}

)

That is, every point in a cluster is in distance at most

r^l{C}(V)

from its respective center. ^[1]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (V, E) as follows:
Given a complete undirected graph G = (V, E) with distances d(v_i, v_j) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

max_vmin_cd(v,c)

Computational complexity

In a complete undirected graph G = (V, E), if we sort the edges in non-decreasing order of the distances: d(e₁) ≤ d(e₂) ≤ ... ≤ d(e_m) and let G_i = (V, E_i), where E_i = . The k-center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k.

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that G_i has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds

l{C}

using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

Pick an arbitrary point

\bar{c}₁

into

C₁

For every point

v\inV

compute

d_1[v]

from

\bar{c}₁

Pick the point

\bar{c}₂

with highest distance from

\bar{c}₁

Add it to the set of centers and denote this expanded set of centers as

C₂

. Continue this till k centers are found

Running time

The i^th iteration of choosing the i^th center takes

l{O}(n)

time.

There are k such iterations.
Thus, overall the algorithm takes

l{O}(nk)

time.

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points

V\subseteql{X}

, belonging to a metric space (

l{X}

,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e.

r^K(V)\leq2r^opt(V,it{k})

^[1]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of

l{C}_opt

contains exactly one point of

Consider a point

v\inV

\bar{c}

be the center it belongs to in

l{C}_opt

\bar{k}

be the center of

that is in

\Pi(l{C}_opt,\bar{c})

d(v,\bar{c})=d(v,l{C}_opt)\leqr^opt(V,k)

Similarly,

d(\bar{k},\bar{c})=d(\bar{k},l{C}_opt)\leqr^opt

By the triangle inequality:

d(v,\bar{k})\leqd(v,\bar{c})+d(\bar{c},\bar{k})\leq2r^opt

Case 2: There are two centers

\bar{k}

and

\bar{u}

that are both in

\Pi(l{C}_opt,\bar{c})

, for some

\bar{c}\inl{C}_opt

(By pigeon hole principle, this is the only other possibility)

Assume, without loss of generality, that

\bar{u}

was added later to the center set

by the greedy algorithm, say in i^th iteration.

But since the greedy algorithm always chooses the point furthest away from the current set of centers, we have that

\bar{k}\inl{C}_i-1

and,

\begin{align} r^K(V)\leq

	l{C
r
	i-1

}(\mathbf)&=d(\bar,\mathcal_)\\ &\leq d(\bar,\bar)\\ &\leq d(\bar,\bar)+d(\bar,\bar)\\ &\leq 2r^ \end^[1]

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that G_i has a dominating set of size at most k and computes a maximal independent set of G_i, looking for the smallest index i that has a maximal independent set with a size of at least k.It is not possible to find an approximation algorithm with an approximation factor of 2 - ε for any ε > 0, unless P = NP.Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP.

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 - ε for any ε > 0, when using k as the parameter.^[2] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP.^[3] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme.^[4] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.^[5]

References

Book: Sariel Har-Peled
. Har-peled, Sariel . Sariel Har-Peled . Geometric Approximation Algorithms . 2011 . 978-0821849118 . American Mathematical Society . Boston, MA, USA.
Feldmann . Andreas Emil . 2019-03-01 . Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs . Algorithmica . en . 81 . 3 . 1031–1052 . 10.1007/s00453-018-0455-0 . 46886829 . 1432-0541.
Book: Feder . Tomás . Greene . Daniel . Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88 . Optimal algorithms for approximate clustering . 1988-01-01 . https://doi.org/10.1145/62212.62255 . New York, NY, USA . Association for Computing Machinery . 434–444 . 10.1145/62212.62255 . 978-0-89791-264-8. 658151 .
Feldmann . Andreas Emil . Marx . Dániel . 2020-07-01 . The Parameterized Hardness of the k-Center Problem in Transportation Networks . Algorithmica . en . 82 . 7 . 1989–2005 . 10.1007/s00453-020-00683-w . 3532236 . 1432-0541.
Book: Feldmann . Andreas Emil . Vu . Tung Anh . Graph-Theoretic Concepts in Computer Science . Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension . 2022 . Bekos . Michael A. . Kaufmann . Michael . https://link.springer.com/chapter/10.1007/978-3-031-15914-5_16 . Lecture Notes in Computer Science . 13453 . en . Cham . Springer International Publishing . 215–229 . 10.1007/978-3-031-15914-5_16 . 2209.00675 . 978-3-031-15914-5.

Metric k-center explained

Formal definition

Computational complexity

Approximations

A simple greedy algorithm

Running time

Proving the approximation factor

Another 2-factor approximation algorithm

Parameterized approximations

See also

References