DeepSpeed explained

DeepSpeed
Author:Microsoft Research
Developer:Microsoft
Latest Release Version:v0.14.4
Programming Language:Python, CUDA, C++
Genre:Software library
License:Apache License 2.0

DeepSpeed is an open source deep learning optimization library for PyTorch.[1]

Library

The library is designed to reduce computing power and memory use and to train large distributed models with better parallelism on existing computer hardware.[2] [3] DeepSpeed is optimized for low latency, high throughput training. It includes the Zero Redundancy Optimizer (ZeRO) for training models with 1 trillion or more parameters.[4] Features include mixed precision training, single-GPU, multi-GPU, and multi-node training as well as custom model parallelism. The DeepSpeed source code is licensed under MIT License and available on GitHub.[5]

The team claimed to achieve up to a 6.2x throughput improvement, 2.8x faster convergence, and 4.6x less communication.[6]

See also

Further reading

External links

Notes and References

  1. Web site: Microsoft Updates Windows, Azure Tools with an Eye on The Future. May 22, 2020. PCMag UK.
  2. Web site: Microsoft speeds up PyTorch with DeepSpeed. Serdar. Yegulalp. February 10, 2020. InfoWorld.
  3. Web site: Microsoft unveils "fifth most powerful" supercomputer in the world. Neowin. 18 June 2023 .
  4. Web site: Microsoft trains world's largest Transformer language model. February 10, 2020.
  5. Web site: microsoft/DeepSpeed. July 10, 2020. GitHub.
  6. Web site: 2021-05-24. DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression. 2021-06-19. Microsoft Research. en-US.