Intel Advisor | |
Developer: | Intel Developer Products |
Latest Release Version: | 2021.4 |
Latest Release Date: | [1] |
Operating System: | Windows and Linux (UI-only on macOS) |
Genre: | Profiler |
License: | Free and commercial support |
Intel Advisor (also known as "Advisor XE", "Vectorization Advisor" or "Threading Advisor") is a design assistance and analysis tool for SIMD vectorization, threading, memory use, and GPU offload optimization. The tool supports C, C++, Data Parallel C++ (DPC++), Fortran and Python languages. It is available on Windows and Linux operating systems in form of Standalone GUI tool, Microsoft Visual Studio plug-in or command line interface.[2] It supports OpenMP (and usage with MPI). Intel Advisor user interface is also available on macOS.
Intel Advisor is available for free as a stand-alone tool or as part of the Intel oneAPI Base Toolkit. Optional paid commercial support is available for the oneAPI Base Toolkit.
Vectorization is the operation of Single Instruction Multiple Data (SIMD) instructions (like Intel Advanced Vector Extensions and Intel Advanced Vector Extensions 512) on multiple objects in parallel within a single CPU core. This can greatly increase performance by reducing loop overhead and making better use of the multiple math units in each core.
Intel Advisor helps find the loops that will benefit from better vectorization, identify where it is safe to force compiler vectorization.[3] It supports analysis of scalar, SSE, AVX, AVX2 and AVX-512-enabled codes generated by Intel, GNU and Microsoft compilers auto-vectorization. It also supports analysis of "explicitly" vectorized codes which use OpenMP 4.x and newer as well as codes or written using C vector intrinsics or assembly language.[4] [5]
Intel Advisor automates the Roofline Performance Model first proposed at Berkeley[6] and extended at the University of Lisbon.[7]
Advisor "Roofline Analysis" helps to identify if given loop/function is memory or CPU bound. It also identifies under optimized loops that can have a high impact on performance if optimized.[8] [9] [10] [11]
Intel Advisor also provides an automated memory-level roofline implementation that is closer to the classical Roofline model. Classical Roofline is especially instrumental for high performance computing applications that are DRAM-bound. Advisor memory level roofline analyzes cache data and evaluates the data transactions between different memory layers to provide guidance for improvement.[12]
Intel Advisor roofline analysis supports code running on CPU or GPU.[13] [14] It also supports integer based applications - that is heavily used in machine learning, big data domains, database applications, financial applications like crypto-coins.[15]
Software architects add code annotations to describe threading that are understood by Advisor, but ignored by the compiler. Advisor then projects the scalability of the threading and checks for synchronization errors. Advisor Threading "Suitability" feature helps to predict and compare the parallel SMP scalability and performance losses for different possible threading designs. Typical Suitability reports are shown on Suitability CPU screen-shot on the right side. Advisor Suitability provides dataset size (iteration space) modeling capabilities and performance penalties break-down (exposing negative impact caused by Load Imbalance, Parallel Runtimes Overhead and Lock Contention).[16]
Advisor adds GPU offload performance modeling feature in the 2021 release. It collects application performance characteristics on a baseline platform and builds analytical performance model for target (modelled) platform.
This provides performance speedup estimation on target GPUs and overhead estimations for offloading, data transfer and scheduling region execution and pinpoints performance bottlenecks.[17] [18] [19] This information can serve for choosing offload strategy: selecting regions to offload and anticipate potential code restructuring needed to make it GPU-ready.
Intel Advisor is used by Schlumberger,[20] Sandia national lab, and others[21] for design and parallel algorithm research and Vectorization Advisor capabilities known to be used by LRZ and ICHEC,[22] Daresbury Lab,[23] Pexip.[24]
The step-by-step workflow is used by academia for educational purposes.[25]