OneAPI (compute acceleration) explained

oneAPI
Operating System:	Cross-platform
Platform:	Cross-platform
Genre:	Open-source software specification for parallel programming

oneAPI is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture.^[1] ^[2] ^[3] ^[4]

oneAPI competes with other GPU computing stacks: CUDA by Nvidia and ROCm by AMD.

Specification

The oneAPI specification extends existing developer programming models to enable multiple hardware architectures through a data-parallel language, a set of library APIs, and a low-level hardware interface to support cross-architecture programming. It builds upon industry standards and provides an open, cross-platform developer stack.^[5] ^[6]

Data Parallel C++

DPC++^[7] ^[8] is a programming language implementation of oneAPI, built upon the ISO C++ and Khronos Group SYCL standards.^[9] DPC++ is an implementation of SYCL with extensions that are proposed for inclusion in future revisions of the SYCL standard, including: unified shared memory, group algorithms, and sub-groups.^[10] ^[11] ^[12]

Libraries

The set of APIs spans several domains, including libraries for linear algebra, deep learning, machine learning, video processing, and others.

Library Name	ShortName	Description
oneAPI DPC++ Library	oneDPL	Algorithms and functions to speed DPC++ kernel programming
oneAPI Math Kernel Library	oneMKL	Math routines including matrix algebra, FFT, and vector math
oneAPI Data Analytics Library	oneDAL	Machine learning and data analytics functions
oneAPI Deep Neural Network Library	oneDNN	Neural networks functions for deep learning training and inference
oneAPI Collective Communications Library	oneCCL	Communication patterns for distributed deep learning
oneAPI Threading Building Blocks	oneTBB	Threading and memory management template library
oneAPI Video Processing Library	oneVPL	Real-time video encode, decode, transcode, and processing

The source code of parts of the above libraries is available on GitHub.^[13]

The oneAPI documentation also lists the "Level Zero" API defining the low-level direct-to-metal interfaces and a set of ray tracing components with its own APIs.

Hardware abstraction layer

oneAPI Level Zero,^[14] ^[15] ^[16] the low-level hardware interface, defines a set of capabilities and services that a hardware accelerator needs to interface with compiler runtimes and other developer tools.

Implementations

Intel has released oneAPI production toolkits that implement the specification and add CUDA code migration, analysis, and debug tools.^[17] ^[18] ^[19] These include the Intel oneAPI DPC++/C++ Compiler,^[20] Intel Fortran Compiler, Intel VTune Profiler^[21] and multiple performance libraries.

Codeplay has released an open-source layer^[22] ^[23] ^[24] to allow oneAPI and SYCL/DPC++ to run atop Nvidia GPUs via CUDA.

University of Heidelberg has developed a SYCL/DPC++ implementation for both AMD and Nvidia GPUs.^[25]

Huawei released a DPC++ compiler for their Ascend AI Chipset

Fujitsu has created an open-source ARM version of the oneAPI Deep Neural Network Library (oneDNN)^[26] for their Fugaku CPU.

Unified Acceleration Foundation (UXL) and the future for oneAPI

Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the contiuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal will compete with Nvidia's CUDA. The main companies behind it are Intel, Google, ARM, Qualcomm, Samsung, Imagination, and VMware.^[27]

Sources

Extending MAGMA Portability with OneAPI . Fortenberry . Anna . Tomov . Stanimire . 2022 . . 22–31 . 2022 Workshop on Accelerator Programming Using Directives (WACCPD) . .

External links

- - Bringing Nvidia GPU support to SYCL developers
Book: 1 . James . Reinders . Ben . Ashbaugh . James . Brodman . Michael . Kinsner . John . Pennycook . Xinmin . Tian . Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems using C++ and SYCL . Springer . 978-1-4842-5574-2 . 10.1007/978-1-4842-5574-2 . Open Access Book . 2021 . 226231933 .

Notes and References

Web site: Intel Expands its Silicon Portfolio, and oneAPI Software Initiative for Next-Generation HPC. 2019-12-09. HPCwire. en-US. 2020-02-11.
Web site: Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI. 2019-11-18. HPCwire. en-US. 2020-02-11.
Web site: SC19: Intel Unveils New GPU Stack, oneAPI Development Effort - ExtremeTech. www.extremetech.com. 2020-02-11.
Web site: Intel One API to Rule Them All Is Much Needed to Expand TAM. Kennedy. Patrick. 2018-12-24. ServeTheHome. en-US. 2020-02-11.
Web site: oneAPI Specification . oneAPI .
Web site: 2021-03-23. Preparing for the Arrival of Intel's Discrete High-Performance GPUs. 2021-03-29. HPCwire. en-US.
Web site: Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL. Apress.
Web site: Heterogeneous Computing Programming: oneAPI and Data Parallel C++. Team. Editorial. 2019-12-16. insideBIGDATA. en-US. 2020-02-11.
Web site: The Khronos Group. 2020-02-11. The Khronos Group. en. 2020-02-11.
Web site: 2020-06-30. Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification. 2020-07-06. The Khronos Group. en.
Web site: staff. 2020-06-30. New, Open DPC++ Extensions Complement SYCL and C++. 2020-07-06. insideHPC. en-US.
Web site: 2021-02-09. SYCL 2020 Launches with New Name, New Features, and High Ambition. 2021-02-16. HPCwire. en-US.
Web site: oneAPI-SRC . GitHub . en.
Web site: Intel Releases Bare-Metal oneAPI Level Zero Specification. Verheyde 2019-12-08T16:11:19Z. Arne. Tom's Hardware. 8 December 2019 . en. 2020-02-11.
Web site: Intel's Compute Runtime Adds oneAPI Level Zero Support - Phoronix. www.phoronix.com. 2020-03-10.
Web site: Initial Benchmarks With Intel oneAPI Level Zero Performance - Phoronix. www.phoronix.com. 2020-04-13.
News: 2020-11-11. Intel Champions XPU Vision With oneAPI, Data Center GPUs - SDxCentral. en-US. SDxCentral. 2020-11-11.
Web site: 2020-11-11. Intel Debuts oneAPI Gold and Provides More Details on GPU Roadmap. 2020-11-11. HPCwire. en-US.
Web site: Moorhead. Patrick. Intel Announces Gold Release Of OneAPI Toolkits And New Intel Server GPU. 2020-12-08. Forbes. en.
Web site: Data Parallel C++ for Cross-Architecture Applications. 2021-10-07. Intel. en.
Web site: Fix Performance Bottlenecks with Intel® VTune™ Profiler. 2021-10-07. Intel. en.
Web site: Codeplay Open Sources a Version of DPC++ for Nvidia GPUs. 2020-02-05. HPCwire. en-US. 2020-02-12.
Web site: Intel's oneAPI / DPC++ / SYCL Will Run Atop NVIDIA GPUs With Open-Source Layer - Phoronix. www.phoronix.com. 2019-12-06.
Web site: Codeplay - Codeplay contribution to DPC++ brings SYCL support for NVIDIA GPUs. www.codeplay.com. 2020-02-11.
Web site: Salter. Jim. 2020-09-30. Intel, Heidelberg University team up to bring Radeon GPU support to AI. 2021-10-07. Ars Technica. en-us.
Web site: fltech. 19 November 2020. A Deep Dive into a Deep Learning Library for the A64FX Fugaku CPU - The Development Story in the Developer's Own Words. 2021-02-10. fltech - 富士通研究所の技術ブログ. ja.
Web site: Exclusive: Behind the plot to break Nvidia's grip on AI by targeting software . . 2024-04-05.