Metric k-center explained

In graph theory, the metric -center problem is a combinatorial optimization problem studied in theoretical computer science. Given cities with specified distances, one wants to build warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of vertices for which the largest distance of any point to its closest vertex in the -set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality.

Formal definition

Let

(X,d)

be a metric space where

X

is a set and

d

is a metric
A set

V\subseteql{X}

, is provided together with a parameter

k

. The goal is to find a subset

l{C}\subseteqV

with

|l{C}|=k

such that the maximum distance of a point in

V

to the closest point in

l{C}

is minimized. The problem can be formally defined as follows:
For a metric space (

l{X}

,d),

V\subseteql{X}

, and a parameter

k

.

l{C}\subseteqV

of

k

points.

rl{C}(V)=\underset{v\inV}{max}

d(v,

l{C}

)

That is, every point in a cluster is in distance at most

rl{C}(V)

from its respective center. [1]

The k-Center Clustering problem can also be defined on a complete undirected graph G = (VE) as follows:
Given a complete undirected graph G = (VE) with distances d(vivj) ∈ N satisfying the triangle inequality, find a subset C ⊆ V with |C| = k while minimizing:

maxvmincd(v,c)

Computational complexity

In a complete undirected graph G = (VE), if we sort the edges in non-decreasing order of the distances: d(e1) ≤ d(e2) ≤ ... ≤ d(em) and let Gi = (V, Ei), where Ei = . The k-center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k.

Although Dominating Set is NP-complete, the k-center problem remains NP-hard. This is clear, since the optimality of a given feasible solution for the k-center problem can be determined through the Dominating Set reduction only if we know in first place the size of the optimal solution (i.e. the smallest index i such that Gi has a dominating set of size at most k), which is precisely the difficult core of the NP-Hard problems. Although a Turing reduction can get around this issue by trying all values of k.

Approximations

A simple greedy algorithm

A simple greedy approximation algorithm that achieves an approximation factor of 2 builds

l{C}

using a farthest-first traversal in k iterations. This algorithm simply chooses the point farthest away from the current set of centers in each iteration as the new center. It can be described as follows:

\bar{c}1

into

C1

v\inV

compute

d1[v]

from

\bar{c}1

\bar{c}2

with highest distance from

\bar{c}1

.

C2

. Continue this till k centers are found

Running time

l{O}(n)

time.

l{O}(nk)

time.

Proving the approximation factor

The solution obtained using the simple greedy algorithm is a 2-approximation to the optimal solution. This section focuses on proving this approximation factor.

Given a set of n points

V\subseteql{X}

, belonging to a metric space (

l{X}

,d), the greedy K-center algorithm computes a set K of k centers, such that K is a 2-approximation to the optimal k-center clustering of V.

i.e.

rK(V)\leq2ropt(V,it{k})

[1]

This theorem can be proven using two cases as follows,

Case 1: Every cluster of

l{C}opt

contains exactly one point of

K

v\inV

\bar{c}

be the center it belongs to in

l{C}opt

\bar{k}

be the center of

K

that is in

\Pi(l{C}opt,\bar{c})

d(v,\bar{c})=d(v,l{C}opt)\leqropt(V,k)

d(\bar{k},\bar{c})=d(\bar{k},l{C}opt)\leqropt

d(v,\bar{k})\leqd(v,\bar{c})+d(\bar{c},\bar{k})\leq2ropt


Case 2: There are two centers

\bar{k}

and

\bar{u}

of

K

that are both in

\Pi(l{C}opt,\bar{c})

, for some

\bar{c}\inl{C}opt

(By pigeon hole principle, this is the only other possibility)

\bar{u}

was added later to the center set

K

by the greedy algorithm, say in ith iteration.

\bar{k}\inl{C}i-1

and,

\begin{align} rK(V)\leq

l{C
r
i-1
}(\mathbf)&=d(\bar,\mathcal_)\\ &\leq d(\bar,\bar)\\ &\leq d(\bar,\bar)+d(\bar,\bar)\\ &\leq 2r^ \end[1]

Another 2-factor approximation algorithm

Another algorithm with the same approximation factor takes advantage of the fact that the k-Center problem is equivalent to finding the smallest index i such that Gi has a dominating set of size at most k and computes a maximal independent set of Gi, looking for the smallest index i that has a maximal independent set with a size of at least k.It is not possible to find an approximation algorithm with an approximation factor of 2 - ε for any ε > 0, unless P = NP.Furthermore, the distances of all edges in G must satisfy the triangle inequality if the k-center problem is to be approximated within any constant factor, unless P = NP.

Parameterized approximations

It can be shown that the k-Center problem is W[2]-hard to approximate within a factor of 2 - ε for any ε > 0, when using k as the parameter.[2] This is also true when parameterizing by the doubling dimension (in fact the dimension of a Manhattan metric), unless P=NP.[3] When considering the combined parameter given by k and the doubling dimension, k-Center is still W[1]-hard but it is possible to obtain a parameterized approximation scheme.[4] This is even possible for the variant with vertex capacities, which bound how many vertices can be assigned to an opened center of the solution.[5]

See also

References

  1. Book: Sariel Har-Peled

    . Har-peled, Sariel . Sariel Har-Peled . Geometric Approximation Algorithms . 2011 . 978-0821849118 . American Mathematical Society . Boston, MA, USA.

  2. Feldmann . Andreas Emil . 2019-03-01 . Fixed-Parameter Approximations for k-Center Problems in Low Highway Dimension Graphs . Algorithmica . en . 81 . 3 . 1031–1052 . 10.1007/s00453-018-0455-0 . 46886829 . 1432-0541.
  3. Book: Feder . Tomás . Greene . Daniel . Proceedings of the twentieth annual ACM symposium on Theory of computing - STOC '88 . Optimal algorithms for approximate clustering . 1988-01-01 . https://doi.org/10.1145/62212.62255 . New York, NY, USA . Association for Computing Machinery . 434–444 . 10.1145/62212.62255 . 978-0-89791-264-8. 658151 .
  4. Feldmann . Andreas Emil . Marx . Dániel . 2020-07-01 . The Parameterized Hardness of the k-Center Problem in Transportation Networks . Algorithmica . en . 82 . 7 . 1989–2005 . 10.1007/s00453-020-00683-w . 3532236 . 1432-0541.
  5. Book: Feldmann . Andreas Emil . Vu . Tung Anh . Graph-Theoretic Concepts in Computer Science . Generalized $$k$$-Center: Distinguishing Doubling and Highway Dimension . 2022 . Bekos . Michael A. . Kaufmann . Michael . https://link.springer.com/chapter/10.1007/978-3-031-15914-5_16 . Lecture Notes in Computer Science . 13453 . en . Cham . Springer International Publishing . 215–229 . 10.1007/978-3-031-15914-5_16 . 2209.00675 . 978-3-031-15914-5.