DeepSpeed explained

DeepSpeed
Author:	Microsoft Research
Developer:	Microsoft
Latest Release Version:	v0.14.4
Programming Language:	Python, CUDA, C++
Genre:	Software library
License:	Apache License 2.0

DeepSpeed is an open source deep learning optimization library for PyTorch.^[1]

Library

The library is designed to reduce computing power and memory use and to train large distributed models with better parallelism on existing computer hardware.^[2] ^[3] DeepSpeed is optimized for low latency, high throughput training. It includes the Zero Redundancy Optimizer (ZeRO) for training models with 1 trillion or more parameters.^[4] Features include mixed precision training, single-GPU, multi-GPU, and multi-node training as well as custom model parallelism. The DeepSpeed source code is licensed under MIT License and available on GitHub.^[5]

The team claimed to achieve up to a 6.2x throughput improvement, 2.8x faster convergence, and 4.6x less communication.^[6]

External links

Notes and References

Web site: Microsoft Updates Windows, Azure Tools with an Eye on The Future. May 22, 2020. PCMag UK.
Web site: Microsoft speeds up PyTorch with DeepSpeed. Serdar. Yegulalp. February 10, 2020. InfoWorld.
Web site: Microsoft unveils "fifth most powerful" supercomputer in the world. Neowin. 18 June 2023 .
Web site: Microsoft trains world's largest Transformer language model. February 10, 2020.
Web site: microsoft/DeepSpeed. July 10, 2020. GitHub.
Web site: 2021-05-24. DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression. 2021-06-19. Microsoft Research. en-US.

DeepSpeed explained

Library

See also

Further reading

External links

Notes and References