Software Heritage Explained

Logo Alt:Software Heritage logo
Founder:Roberto DiCosmo,
Stefano Zacchiroli
Type:Nonprofit
Headquarters:Inria
Location:Rocquencourt, France
Leader Title2:Scientific Advisors
Leader Name2:Gérard Berry
Jean-François Abramatic
Julia Lawall
Serge Abiteboul
Affiliations:Inria
Staff:13

Software Heritage is a non-profit organization which provides a service for archiving and referencing historical and contemporary software with a focus on human readable source code. The site was unveiled in 2016 by Inria[1] and is supported by UNESCO.[2] [3] [4] The project itself is structured as a nonprofit multistakeholder initiative.

Overview

The stated mission of Software Heritage is to collect, preserve and share all software that is publicly available in source code form, with the goal of building a common, shared infrastructure at the service of industry, research, culture and society as a whole.[5]

Software source code is collected by crawling code hosting platforms, like GitHub, GitLab.com or Bitbucket, and packages archives, like npm or PyPI, and ingested into a special data structure, a Merkle DAG, that is the core of the archive.[6] Each artifact in the archive is associated with an identifier called a SWHID.[7] In 2023, the expansion of SWHID was changed from Software Heritage identifier to software hash identifier.

In order to increase the chances of preserving the Software Heritage archive over the long term, a mirror program was established in 2018, joined by ENEA[8] and FossID[9] as of October 2020.

History

Development of Software Heritage began at Inria under the direction of computer scientists Roberto Di Cosmo and Stefano Zacchiroli in early 2015,[10] and the project was officially announced to the public on June 30, 2016.[11]

In 2017 Inria signed an agreement with UNESCO for the long-term preservation of software source code and for making it widely available, in particular through the Software Heritage initiative.[12]

In June 2018, the Software Heritage Archive was opened at UNESCO headquarters.

On July 4, 2018, Software Heritage was included in the French National Plan for Open Science.[13]

In October 2018, the strategy and vision underlying the mission of Software Heritage were published in Communications of the ACM.

In November 2018, a group of forty international experts met at the invitation of Inria and UNESCO,[14] which led to the publication in February 2019 of Paris Call: Software Source Code as Heritage for Sustainable Development.[15]

In November 2019, Inria signed an agreement with GitHub to improve the archival process for GitHub-hosted projects in the Software Heritage archive.[16]

As of October 2020, Software Heritage’s repository held over 143 million software projects in an archive of over 9.1 billion unique source files.

Funding

Software Heritage is a non-profit organization, funded largely from donations from supporting sponsors, that include private companies, public bodies and academic institutions.[17]

Software Heritage also seeks support for funding third parties interested in contributing to its mission. A grant from NLNet[18] funded the work of Octobus[19] and Tweag[20] that led to rescuing 250.000 Mercurial repositories phased out from Bitbucket.[21]

A grant from the Alfred P. Sloan Foundation funds experts to develop new connectors for expanding coverage of the Software Heritage Archive [22]

Development and community

The Software Heritage infrastructure is built transparently and collaboratively. All the software developed in the process is released as free and open-source software.[23] An ambassador program has been announced in December 2020 with the stated goal to grow the community of users and contributors.[24]

Awards

In 2016, Software Heritage received the best community project award at Paris Open Source Summit 2016.[25] [26]

In 2019, Software Heritage received the award of Academic Initiative from the Pôle Systematic.[27]

Notes and References

  1. Web site: Collect, organise, preserve and share the Software Heritage of mankind. Software Heritage. 26 July 2016. 30 June 2016.
  2. Web site: UNESCO. Software Heritage. 14 November 2019 . 2 November 2020.
  3. News: Brown. Paul. Software Heritage: Creating a safe haven for software. 26 July 2016. Boing Boing. 30 June 2016.
  4. News: Jost. Clémence. Open source: lancement de Software Heritage, la plus grande bibliothèque de codes source de la planète. 27 July 2016. Archimag. 1 July 2016.
  5. News: Abramatic. Jean-François. Di Cosmo. Roberto. Zacchiroli. Stefano. Building the Universal Archive of Source Code Journal Article. 2 November 2020. Communications of the ACM. 1 October 2018.
  6. Web site: Software Heritage Archive. 2 November 2020.
  7. Web site: Software Heritage Persistent Identifiers. Software Heritage. 2 November 2020.
  8. Web site: At ENEA the first institutional mirror of Software Heritage. ENEA. 2 November 2020. 16 November 2020. https://web.archive.org/web/20201116125254/https://www.enea.it/en/news-enea/news/technology-at-enea-the-first-european-institutional-mirror-of-software-heritage. dead.
  9. Web site: FossID establishes first independent mirror of world's larges source code archive. FossID. 6 December 2018. 2 November 2020. 23 September 2020. https://web.archive.org/web/20200923055251/https://fossid.com/2018/12/06/fossid-establishes-first-independent-mirror-of-worlds-largest-source-code-archive/. dead.
  10. News: Moody. Lyn. Software Heritage, the "Library of Alexandria of software," launches today. 26 July 2016. Ars Technica. 30 June 2016.
  11. News: Brogan. Jacob. Introducing Software Heritage, the Library of Alexandria for Code. 26 July 2016. Slate. 30 June 2016.
  12. UNESCO . Discours de la Directrice générale de l'UNESCO, Irina Bokova, à l'occasion de la signature de l'accord entre l'UNESCO et INRIA portant sur la préservation et le partage du patrimoine logiciel . 3 April 2020 . UNESCO . Paris, France . 2020-11-03. Bokova, IG, Director-General, 20092017.
  13. Web site: National Plan for Open Science. Ouvrir La Science. 2 November 2020. 1 July 2021. https://web.archive.org/web/20210701104417/https://cache.media.enseignementsup-recherche.gouv.fr/file/Recherche/50/1/SO_A4_2018_EN_01_leger_982501.pdf. dead.
  14. Experts call for greater recognition of software source code as heritage for sustainable development . 16 November 2020 . UNESCO . Paris, France . 2 November 2020.
  15. Web site: Paris Call on software source code as heritage for sustainable development. Paris . UNESCO . February 2019. 2 November 2020.
  16. Web site: GitHub Archive Program. November 2019. 2 November 2020.
  17. Web site: Software Heritage Sponsors. 2 November 2020.
  18. Web site: NLNet Software Heritage grant. 2 November 2020.
  19. Web site: Augmenting Software Heritage archiving capabilities. 2 November 2020.
  20. Web site: Long-term reproducibility with Nix and Software HERITAGE. 2 November 2020.
  21. Web site: Announcing the Mercurial public Bitbucket archive. 2 November 2020.
  22. Web site: Sloan Foundation. Excited to support Software Heritage. 2 November 2020.
  23. Web site: Software Heritage licensing. 25 February 2021.
  24. Web site: Software Heritage Ambassadors. 25 February 2021.
  25. Web site: Les Acteurs du Libre - Précédents Lauréats . 8 May 2020 . 18 January 2019 . https://web.archive.org/web/20190118112843/http://www.lesacteursdulibre.com/ . bot: unknown .
  26. Web site: Paris Open Source Summit 2016 : Prix Acteurs du Libre : et les gagnants sont... . Programmez! . 28 June 2019 . fr . 17 November 2016.
  27. Pole_Systematic . 1144308178420719616 . 27 June 2019. Convention @Pole_Systematic le Trophée Prix Initiative académique est remis @SWHeritage. .