Bailey's FFT algorithm explained

The Bailey's FFT (also known as a 4-step FFT) is a high-performance algorithm for computing the fast Fourier transform (FFT). This variation of the Cooley–Tukey FFT algorithm was originally designed for systems with hierarchical memory common in modern computers (and was the first FFT algorithm in this so called "out of core" class). The algorithm treats the samples as a two dimensional matrix (thus yet another name, a matrix FFT algorithm) and executes short FFT operations on the columns and rows of the matrix, with a correction multiplication by "twiddle factors" in between.

The algorithm got its name after an article by David H. Bailey, FFTs in external or hierarchical memory, published in 1989. In this article Bailey credits the algorithm to W. M. Gentleman and G. Sande who published their paper, Fast Fourier Transforms: for fun and profit,^[1] some twenty years earlier in 1966. The algorithm can be considered a radix-

\sqrtn

FFT decomposition.

Here is a brief overview of how the "4-step" version of the Bailey FFT algorithm works:

The data (in natural order) is first arranged into a matrix.
Each column of a matrix is then independently processed using a standard FFT algorithm.
Each element of a matrix is multiplied by a correction coefficient.
Each row of a matrix is then independently processed using a standard FFT algorithm.

The result (in natural order) is read column-by-column. Since the operations are performed column-wise and row-wise, steps 2 and 4 (and reading of the result) might include a matrix transpose to rearrange the elements in a way convenient for processing. The algorithm resembles a 2-dimensional FFT, a 3-dimensional (and beyond) extensions are known as 5-step FFT, 6-step FFT, etc.

The Bailey FFT is typically used for computing DFTs of large datasets, such as those used in scientific and engineering applications. The Bailey FFT is a very efficient algorithm, and it has been used to compute FFTs of datasets with billions of elements (when applied to the number-theoretic transform, the datasets of the order of 10¹² elements were processed in mid-2000s).

Sources

Book: Bailey . D. H. . Proceedings of the 1989 ACM/IEEE conference on Supercomputing - Supercomputing '89 . FFTS in external of hierarchical memory . March 1989 . ACM Press . 10.1145/76263.76288 . 23–35 . 4 . 1 . 0897913418 . 52809390 . https://www.davidhbailey.com/dhbpapers/fftq.pdf.
Frigo . M. . Johnson . S.G. . The Design and Implementation of FFTW3 . Proceedings of the IEEE . February 2005 . 93 . 2 . 216–231 . 0018-9219 . 10.1109/JPROC.2004.840301 . 2005IEEEP..93..216F . 6644892 . 10.1.1.66.3097 .
Book: Hart . William B. . Tornaría . Gonzalo . Watkins . Mark . Congruent Number Theta Coefficients to 10¹² . Algorithmic Number Theory . Lecture Notes in Computer Science . 2010 . 6197 . 186–200 . Springer Berlin Heidelberg . 0302-9743 . 1611-3349 . 10.1007/978-3-642-14518-6_17 . 978-3-642-14517-9 . http://wrap.warwick.ac.uk/41654/1/WRAP_Hart_0584144-ma-270913-congruent.pdf .
Al Na'mneh . Rami . Pan . W. David . Five-step FFT algorithm with reduced computational complexity . Information Processing Letters . March 2007 . 101 . 6 . 262–267 . 0020-0190 . 10.1016/j.ipl.2006.10.009 .
Book: Jörg . Arndt . 1 October 2010 . Matters Computational: Ideas, Algorithms, Source Code . Springer Science & Business Media . 438–439 . 978-3-642-14764-7 . 1005788763 . https://books.google.com/books?id=HsRHS6u7e80C&pg=PA438 . The Matrix Fourier Algorithm (MFA).

Notes and References

W.M. . Gentleman . G. . Sande . Fast Fourier Transforms—For Fun and Profit . Fall Joint Computer Conference, November 7-10, 1966 . AFIPS Conference Proceedings Volume 29 . 1966 . San Francisco, California . 563–578 .

Bailey's FFT algorithm explained

See also

Sources

Notes and References