Galactic algorithm explained
A galactic algorithm is one with record-breaking theoretical (asymptotic) performance, but which is never used in practice. Typical reasons are that the performance gains only appear for problems that are so large they never occur, or the algorithm's complexity outweighs a relatively small gain in performance. Galactic algorithms were so named by Richard Lipton and Ken Regan,[1] because they will never be used on any data sets on Earth.
Possible use cases
Even if they are never used in practice, galactic algorithms may still contribute to computer science:
- An algorithm, even if impractical, may show new techniques that may eventually be used to create practical algorithms.
- Available computational power may catch up to the crossover point, so that a previously impractical algorithm becomes practical.
- An impractical algorithm can still demonstrate that conjectured bounds can be achieved, or that proposed bounds are wrong, and hence advance the theory of algorithms. As Lipton states:[1] Similarly, a hypothetical large but polynomial
algorithm for the
Boolean satisfiability problem, although unusable in practice, would settle the
P versus NP problem, considered the most important open problem in computer science and one of the
Millennium Prize Problems.
[2] Examples
Integer multiplication
An example of a galactic algorithm is the fastest known way to multiply two numbers,[3] which is based on a 1729-dimensional Fourier transform.[4] It needs
bit operations, but as the constants hidden by the
big O notation are large, it is never used in practice. However, it also shows why galactic algorithms may still be useful. The authors state: "we are hopeful that with further refinements, the algorithm might become practical for numbers with merely billions or trillions of digits."
[4] Matrix multiplication
The first improvement over brute-force matrix multiplication (which needs
multiplications) was the
Strassen algorithm: a recursive algorithm that needs
multiplications. This algorithm is not galactic and is used in practice. Further extensions of this, using sophisticated group theory, are the
Coppersmith–Winograd algorithm and its slightly better successors, needing
multiplications. These are galactic – "We nevertheless stress that such improvements are only of theoretical interest, since the huge constants involved in the complexity of fast matrix multiplication usually make these algorithms impractical."
Communication channel capacity
Claude Shannon showed a simple but asymptotically optimal code that can reach the theoretical capacity of a communication channel. It requires assigning a random code word to every possible
-bit message, then decoding by finding the closest code word. If
is chosen large enough, this beats any existing code and can get arbitrarily close to the capacity of the channel. Unfortunately, any
big enough to beat existing codes is also completely impractical.
[5] These codes, though never used, inspired decades of research into more practical algorithms that today can achieve rates arbitrarily close to channel capacity.
[6] Sub-graphs
The problem of deciding whether a graph
contains
as a
minor is
NP-complete in general, but where
is fixed, it can be solved in polynomial time. The running time for testing whether
is a minor of
in this case is
,
[7] where
is the number of vertices in
and the
big O notation hides a constant that depends superexponentially on
. The constant is greater than
2\uparrow\uparrow(2\uparrow\uparrow(2\uparrow\uparrow(h/2)))
in
Knuth's up-arrow notation, where
is the number of vertices in
.
[8] Even the case of
cannot be reasonably computed as the constant is greater than 2
tetrated by 65536, that is,
}} \atop 65536}.
Cryptographic breaks
In cryptography jargon, a "break" is any attack faster in expectation than brute force – i.e., performing one trial decryption for each possible key. For many cryptographic systems, breaks are known, but are still practically infeasible with current technology. One example is the best attack known against 128-bit AES, which takes only
operations.
[9] Despite being impractical, theoretical breaks can provide insight into vulnerability patterns, and sometimes lead to discovery of exploitable breaks.
Traveling salesman problem
For several decades, the best known approximation to the traveling salesman problem in a metric space was the very simple Christofides algorithm which produced a path at most 50% longer than the optimum. (Many other algorithms could usually do much better, but could not provably do so.) In 2020, a newer and much more complex algorithm was discovered that can beat this by
percent.
[10] Although no one will ever switch to this algorithm for its very slight worst-case improvement, it is still considered important because "this minuscule improvement breaks through both a theoretical logjam and a psychological one".
[11] Hutter search
A single algorithm, "Hutter search", can solve any well-defined problem in an asymptotically optimal time, barring some caveats. It works by searching through all possible algorithms (by runtime), while simultaneously searching through all possible proofs (by length of proof), looking for a proof of correctness for each algorithm. Since the proof of correctness is of finite size, it "only" adds a constant and does not affect the asymptotic runtime. However, this constant is so big that the algorithm is entirely impractical.[12] [13] For example, if the shortest proof of correctness of a given algorithm is 1000 bits long, the search will examine at least 2999 other potential proofs first.
Hutter search is related to Solomonoff induction, which is a formalization of Bayesian inference. All computable theories (as implemented by programs) which perfectly describe previous observations are used to calculate the probability of the next observation, with more weight put on the shorter computable theories. Again, the search over all possible explanations makes this procedure galactic.
Optimization
Simulated annealing, when used with a logarithmic cooling schedule, has been proven to find the global optimum of any optimization problem. However, such a cooling schedule results in entirely impractical runtimes, and is never used.[14] However, knowing this ideal algorithm exists has led to practical variants that are able to find very good (though not provably optimal) solutions to complex optimization problems.[15]
Minimum spanning trees
The expected linear time MST algorithm is able to discover the minimum spanning tree of a graph in
, where
is the number of edges and
is the number of nodes of the graph.
[16] However, the constant factor that is hidden by the
Big O notation is huge enough to make the algorithm impractical. An implementation is publicly available
[17] and given the experimentally estimated implementation constants, it would only be faster than
Borůvka's algorithm for graphs in which
.
[18] Hash tables
Researchers have found an algorithm that achieves the provably best-possible[19] asymptotic performance in terms of time-space tradeoff.[20] But it remains purely theoretical: "Despite the new hash table’s unprecedented efficiency, no one is likely to try building it anytime soon. It’s just too complicated to construct."[21] and "in practice, constants really matter. In the real world, a factor of 10 is a game ender.”[22]
Notes and References
- Book: Lipton . Richard J. . Richard Lipton. Kenneth W. . Regan . David Johnson: Galactic Algorithms . People, Problems, and Proofs: Essays from Gödel's Lost Letter: 2010 . Springer Berlin . Heidelberg . 2013 . 109–112 . https://books.google.com/books?id=eLC9BAAAQBAJ&pg=PA109 . 9783642414220.
- Fortnow, L. . 2009 . The status of the P versus NP problem . Communications of the ACM . 52 . 9 . 78–86 . 10.1145/1562164.1562186 . 5969255 .
- David. Harvey. Hoeven. Joris van der. March 2019. Integer multiplication in time O(n log n). HAL. hal-02070778.
- Web site: Harvey . David . We've found a quicker way to multiply really big numbers . The Conversation . 9 April 2019 . 9 March 2023 . en.
- Web site: Explained: The Shannon limit . MIT News Office . Larry Hardesty . January 19, 2010.
- Web site: Capacity-approaching codes (Chapter 13 of Principles Of Digital Communication II) . . 2005.
- Kawarabayashi . Ken-ichi . Kobayashi . Yusuke . Reed . Bruce . Bruce Reed (mathematician) . 2012 . The disjoint paths problem in quadratic time . . Series B . 102 . 2 . 424–435. 10.1016/j.jctb.2011.07.004 . free .
- Johnson, David S. . The NP-completeness column: An ongoing guide (edition 19) . Journal of Algorithms . 8 . 2 . 1987 . 285–303 . 10.1.1.114.3864 . 10.1016/0196-6774(87)90043-5 .
- Book: Biaoshuai Tao . Information Security and Privacy. 9144. 39–56. Hongjun Wu . amp . 2015. 10.1007/978-3-319-19962-7_3 . Lecture Notes in Computer Science. 978-3-319-19961-0.
- 2007.01409 . cs.DS . A (Slightly) Improved Approximation Algorithm for Metric TSP . Anna R. Karlin . Nathan Klein . Shayan Oveis Gharan . September 1, 2020.
- Web site: Klarreich . Erica . 8 October 2020 . Computer Scientists Break Traveling Salesperson Record . Quanta Magazine.
- Hutter. Marcus. 2002-06-14. The Fastest and Shortest Algorithm for All Well-Defined Problems. cs/0206022.
- Gagliolo. Matteo. 2007-11-20. Universal search. Scholarpedia. en. 2. 11. 2575. 10.4249/scholarpedia.2575. 1941-6016. 2007SchpJ...2.2575G. free.
- Liang . Faming . Yichen . Cheng . Guang . Lin . Simulated stochastic approximation annealing for global optimization with a square-root cooling schedule . Journal of the American Statistical Association . 109 . 506 . 2014 . 847–863. 10.1080/01621459.2013.872993 . 123410795 .
- Ingber, Lester . Simulated annealing: Practice versus theory . Mathematical and Computer Modelling . 18 . 11 . 1993 . 29–57. 10.1016/0895-7177(93)90204-C . free . 10.1.1.15.1046 .
- Karger . David R. . Klein . Philip N. . Tarjan . Robert E. . 1995-03-01 . A randomized linear-time algorithm to find minimum spanning trees . Journal of the ACM . 42 . 2 . 321–328 . 10.1145/201019.201022 . 0004-5411. free .
- Web site: Thiesen . Francisco . A C++ implementation for an Expected Linear-Time Minimum Spanning Tree Algorithm(Karger-Klein-Tarjan + Hagerup Minimum Spanning Tree Verification as a sub-routine) . 2022-11-19 . GitHub.
- Web site: Geiman Thiesen . Francisco . Expected Linear-Time Minimum Spanning Trees . 2022-11-13 . franciscothiesen.github.io.
- Web site: Tight Cell-Probe Lower Bounds for Dynamic Succinct Dictionaries . Tianxiao Li, Jingxun Liang, Huacheng Yu, and Renfei Zhou.
- Bender . Michael . Farach-Colton . Martin . Kuszmaul . John . Kuszmaul . William . Mingmou . Liu . 4 Nov 2021 . On the Optimal Time/Space Tradeoff for Hash Tables . 2111.00602 . cs.
- Web site: Scientists Find Optimal Balance of Data Storage and Time.
- William Kuszmaul, as quoted in Web site: Scientists Find Optimal Balance of Data Storage and Time.