Data: | Graph |
Time: | O(E\sqrtV) |
Class: | Graph algorithm |
Space: | O(V) |
In computer science, the Hopcroft–Karp algorithm (sometimes more accurately called the Hopcroft–Karp–Karzanov algorithm) is an algorithm that takes a bipartite graph as input and produces a maximum-cardinality matching as output — a set of as many edges as possible with the property that no two edges share an endpoint. It runs in
O(|E|\sqrt{|V|})
E
V
|E|=\Omega(|V|)
O(|V|2.5)
O(|E|log|V|)
The algorithm was discovered by and independently by . As in previous methods for matching such as the Hungarian algorithm and the work of, the Hopcroft–Karp algorithm repeatedly increases the size of a partial matching by finding augmenting paths. These paths are sequences of edges of the graph, which alternate between edges in the matching and edges out of the partial matching, and where the initial and final edge are not in the partial matching. Finding an augmenting path allows us to increment the size of the partial matching, by simply toggling the edges of the augmenting path (putting in the partial matching those that were not, and vice versa). Simpler algorithms for bipartite matching, such as the Ford–Fulkerson algorithm‚ find one augmenting path per iteration: the Hopcroft-Karp algorithm instead finds a maximal set of shortest augmenting paths, so as to ensure that only
O(\sqrt{|V|})
O(|V|)
O(|E|\sqrt{|V|})
The Hopcroft–Karp algorithm can be seen as a special case of Dinic's algorithm for the maximum-flow problem.
A vertex that is not the endpoint of an edge in some partial matching
M
If
M
P
M
M ⊕ P
|M|+1
Conversely, suppose that a matching
M
P
M ⊕ M*
M*
M
M*
P
P
M
M
M*
M*
M
M*
M
P
M*
M
M
An augmenting path in a matching problem is closely related to the augmenting paths arising in maximum flow problems, paths along which one may increase the amount of flow between the terminals of the flow. It is possible to transform the bipartite matching problem into a maximum flow instance, such that the alternating paths of the matching problem become augmenting paths of the flow problem. It suffices to insert two vertices, source and sink, and insert edges of unit capacity from the source to each vertex in
U
V
U
V
The algorithm may be expressed in the following pseudocode.
Input: Bipartite graph
G(U\cupV,E)
Output: Matching
M\subseteqE
M\leftarrow\empty
repeat
lP\leftarrow\{P1,P2,...,Pk\}
M\leftarrowM ⊕ (P1\cupP2\cup...\cupPk)
until
lP=\empty
In more detail, let
U
V
G
U
V
M
U
U
U
V
k
V
V
k
F
v
F
k
F
U
F
U
U
O(|E|)
U
V
M
The algorithm terminates when no more augmenting paths are found in the breadth first search part of one of the phases.
Each phase consists of a single breadth first search and a single depth-first search. Thus, a single phase may be implemented in
O(|E|)
\sqrt{|V|}
|V|
|E|
O(|E|\sqrt{|V|})
Each phase increases the length of the shortest augmenting path by at least one: the phase finds a maximal set of augmenting paths of the given length, so any remaining augmenting path must be longer. Therefore, once the initial
\sqrt{|V|}
\sqrt{|V|}
\sqrt{|V|}
\sqrt{|V|}
M
\sqrt{|V|}
\sqrt{|V|}
Since the algorithm performs a total of at most
2\sqrt{|V|}
O(|E|\sqrt{|V|})
In many instances, however, the time taken by the algorithm may be even faster than this worst case analysis indicates. For instance, in the average case for sparse bipartite random graphs, (improving a previous result of) showed that with high probability all non-optimal matchings have augmenting paths of logarithmic length. As a consequence, for these graphs, the Hopcroft–Karp algorithm takes
O(log|V|)
O(|E|log|V|)
For sparse graphs, the Hopcroft–Karp algorithm continues to have the best known worst-case performance, but for dense graphs (
|E|=\Omega(|V|2)
O\left(|V|1.5\sqrt{
|E| | |
log|V| |
Several authors have performed experimental comparisons of bipartite matching algorithms. Their results in general tend to show that the Hopcroft–Karp method is not as good in practice as it is in theory: it is outperformed both by simpler breadth-first and depth-first strategies for finding augmenting paths, and by push-relabel techniques.[2]
The same idea of finding a maximal set of shortest augmenting paths works also for finding maximum cardinality matchings in non-bipartite graphs, and for the same reasons the algorithms based on this idea take
O(\sqrt{|V|})
/* G = U ∪ V ∪ where U and V are the left and right sides of the bipartite graph and NIL is a special null vertex */ function BFS is for each u in U do if Pair_U[u] = NIL then Dist[u] := 0 Enqueue(Q, u) else Dist[u] := ∞ Dist[NIL] := ∞ while Empty(Q) = false do u := Dequeue(Q) if Dist[u] < Dist[NIL] then for each v in Adj[u] do if Dist[Pair_V[v]] = ∞ then Dist[Pair_V[v]] := Dist[u] + 1 Enqueue(Q, Pair_V[v]) return Dist[NIL] ≠ ∞ function DFS(u) is if u ≠ NIL then for each v in Adj[u] do if Dist[Pair_V[v]] = Dist[u] + 1 then if DFS(Pair_V[v]) = true then Pair_V[v] := u Pair_U[u] := v return true Dist[u] := ∞ return false return true function Hopcroft–Karp is for each u in U do Pair_U[u] := NIL for each v in V do Pair_V[v] := NIL matching := 0 while BFS = true do for each u in U do if Pair_U[u] = NIL then if DFS(u) = true then matching := matching + 1 return matching
Let the vertices of our graph be partitioned in U and V, and consider a partial matching, as indicated by the Pair_U and Pair_V tables that contain the one vertex to which each vertex of U and of V is matched, or NIL for unmatched vertices. The key idea is to add two dummy vertices on each side of the graph: uDummy connected to all unmatched vertices in U and vDummy connected to all unmatched vertices in V. Now, if we run a breadth-first search (BFS) from uDummy to vDummy then we can get the paths of minimal length that connect currently unmatched vertices in U to currently unmatched vertices in V. Note that, as the graph is bipartite, these paths always alternate between vertices in U and vertices in V, and we require in our BFS that when going from V to U, we always select a matched edge. If we reach an unmatched vertex of V, then we end at vDummy and the search for paths in the BFS terminate. To summarize, the BFS starts at unmatched vertices in U, goes to all their neighbors in V, if all are matched then it goes back to the vertices in U to which all these vertices are matched (and which were not visited before), then it goes to all the neighbors of these vertices, etc., until one of the vertices reached in V is unmatched.
Observe in particular that BFS marks the unmatched nodes of U with distance 0, then increments the distance every time it comes back to U. This guarantees that the paths considered in the BFS are of minimal length to connect unmatched vertices of U to unmatched vertices of V while always going back from V to U on edges that are currently part of the matching. In particular, the special NIL vertex, which corresponds to vDummy, then gets assigned a finite distance, so the BFS function returns true iff some path has been found. If no path has been found, then there are no augmenting paths left and the matching is maximal.
If BFS returns true, then we can go ahead and update the pairing for vertices on the minimal-length paths found from U to V: we do so using a depth-first search (DFS). Note that each vertex in V on such a path, except for the last one, is currently matched. So we can explore with the DFS, making sure that the paths that we follow correspond to the distances computed in the BFS. We update along every such path by removing from the matching all edges of the path that are currently in the matching, and adding to the matching all edges of the path that are currently not in the matching: as this is an augmenting path (the first and last edges of the path were not part of the matching, and the path alternated between matched and unmatched edges), then this increases the number of edges in the matching. This is same as replacing the current matching by the symmetric difference between the current matching and the entire path..
Note that the code ensures that all augmenting paths that we consider are vertex disjoint. Indeed, after doing the symmetric difference for a path, none of its vertices could be considered again in the DFS, just because the Dist[Pair_V[v]] will not be equal to Dist[u] + 1 (it would be exactly Dist[u]).
Also observe that the DFS does not visit the same vertex multiple times. This is thanks to the following lines: Dist[u] = ∞ return falseWhen we were not able to find any shortest augmenting path from a vertex u, then the DFS marks vertex u by setting Dist[u] to infinity, so that these vertices are not visited again.
One last observation is that we actually don't need uDummy: its role is simply to put all unmatched vertices of U in the queue when we start the BFS. As for vDummy, it is denoted as NIL in the pseudocode above.