In software development, distributed version control (also known as distributed revision control) is a form of version control in which the complete codebase, including its full history, is mirrored on every developer's computer.[1] Compared to centralized version control, this enables automatic management branching and merging, speeds up most operations (except pushing and pulling), improves the ability to work offline, and does not rely on a single location for backups.[1] [2] Git, the world's most popular version control system, is a distributed version control system.
In 2010, software development author Joel Spolsky described distributed version control systems as "possibly the biggest advance in software development technology in the [past] ten years".[3]
Distributed version control systems (DVCS) use a peer-to-peer approach to version control, as opposed to the client–server approach of centralized systems. Distributed revision control synchronizes repositories by transferring patches from peer to peer. There is no single central version of the codebase; instead, each user has a working copy and the full change history.
Advantages of DVCS (compared with centralized systems) include:
Disadvantages of DVCS (compared with centralized systems) include:
Some originally centralized systems now offer some distributed features. Team Foundation Server and Visual Studio Team Services now host centralized and distributed version control repositories via hosting Git.
Similarly, some distributed systems now offer features that mitigate the issues of checkout times and storage costs, such as the Virtual File System for Git developed by Microsoft to work with very large codebases,[7] which exposes a virtual file system that downloads files to local storage only as they are needed.
The distributed model is generally better suited for large projects with partly independent developers, such as the Linux kernel project, because developers can work independently and submit their changes for merge (or rejection). This flexibility allows adopting custom source code contribution workflows, such as the integrator workflow, which is the most widely used. Unlike the centralized model where developers must serialize their work to avoid problems with different versions, in the distributed model, developers can clone the entire history of the code to their local machines. They commit their changes to their local repositories first, creating 'change sets,' before pushing them to the master repository. This approach enables developers to work locally and disconnected, making it more convenient for distributed teams.[8]
In a truly distributed project, such as Linux, every contributor maintains their own version of the project, with different contributors hosting their own respective versions and pulling in changes from other users as needed, resulting in a general consensus emerging from multiple different nodes. This also makes the process of "forking" easy, as all that is required is one contributor stop accepting pull requests from other contributors and letting the codebases gradually grow apart.
This arrangement, however, can be difficult to maintain, resulting in many projects choosing to shift to a paradigm in which one contributor is the universal "upstream", a repository from whom changes are almost always pulled. Under this paradigm, development is somewhat recentralized, as every project now has a central repository that is informally considered as the official repository, managed by the project maintainers collectively. While distributed version control systems make it easy for new developers to "clone" a copy of any other contributor's repository, in a central model, new developers always clone the central repository to create identical local copies of the code base. Under this system, code changes in the central repository are periodically synchronized with the local repository, and once the development is done, the change should be integrated into the central repository as soon as possible.
Organizations utilizing this centralize pattern often choose to host the central repository on a third party service like GitHub, which offers not only more reliable uptime than self-hosted repositories, but can also add centralized features like issue trackers and continuous integration.
Contributions to a source code repository that uses a distributed version control system are commonly made by means of a pull request, also known as a merge request.[9] The contributor requests that the project maintainer pull the source code change, hence the name "pull request". The maintainer has to merge the pull request if the contribution should become part of the source base.[10]
The developer creates a pull request to notify maintainers of a new change; a comment thread is associated with each pull request. This allows for focused discussion of code changes. Submitted pull requests are visible to anyone with repository access. A pull request can be accepted or rejected by maintainers.[11]
Once the pull request is reviewed and approved, it is merged into the repository. Depending on the established workflow, the code may need to be tested before being included into official release. Therefore, some projects contain a special branch for merging untested pull requests.[12] Other projects run an automated test suite on every pull request, using a continuous integration tool, and the reviewer checks that any new code has appropriate test coverage.
The first open-source DVCS systems included Arch, Monotone, and Darcs. However, open source DVCSs were never very popular until the release of Git and Mercurial.
BitKeeper was used in the development of the Linux kernel from 2002 to 2005.[13] The development of Git, now the world's most popular version control system,[14] was prompted by the decision of the company that made BitKeeper to rescind the free license that Linus Torvalds and some other Linux kernel developers had previously taken advantage of.