Package development process explained

A software package development process is a system for developing software packages. Such packages are used to reuse and share code, e.g., via a software repository. A package development process includes a formal system for package checking that usually exposes bugs, thereby potentially making it easier to produce trustworthy software (Chambers' prime directive).[1] It may also include a standard for documentation, thereby making it easier for new users to learn how to use it.

Discussion

In this context, a package is a collection of functions written for use in a single language such as Python or R. It may also be bundled with documentation. For many programming languages, there are software repositories where people share such packages.

For example, a Python package combines documentation, code and initial set up and possibly examples that could be used as unit tests in a single file with a "py" extension.

By contrast, an R package has documentation with examples in files separate from the code, possibly bundled with other material such as sample data sets and introductory vignettes. The source code for an R package is contained in a directory with a master "description" file and separate subdirectories for documentation, code, optional data sets suitable for unit or regression testing, and perhaps others.[2] A formal package compilation process[3] [4] checks for errors of various types. This includes checking for syntax errors on both the documentation markup language and the code, as well as comparing the arguments between documentation and code. Examples in the documentation are tested and produce error messages if they fail. This can be used as a primitive form of unit testing; more formal unit tests and regression testing can be included. This can improve software development productivity by making it easier to find bugs as the code is being developed. In addition, the documentation makes it easier to share code with others. It also makes it easier for a developer to use code written months or even years earlier. Routine checks are made of packages contributed to the Comprehensive R Archive Network (CRAN) and under development in the companion open-source collaborative development web site, R-Forge. These checks compile the packages repeatedly on different platforms under different versions of the core R language. The results are made available to package maintainers. In this way, package contributors become aware of problems they might otherwise never encounter themselves, because they otherwise would not have easy access to those alternative test results.

An interesting research question would be to compare the quality of contributions to different software repositories and try to relate that to features of the language and accompanying package development process. This could include trying to compare the rate of growth of contributed software to the degree of formality and enforcement of standards for documentation, testing, and coding.

See also

Notes and References

  1. Book: Chambers, John M. . John Chambers (statistician)

    . John Chambers (statistician) . Software for Data Analysis: Programming with R . Springer . 2008 . 978-0-387-75935-7 .

  2. Book:

    . Writing R Extensions . .

  3. News: Leisch . Friedrich . Creating R Packages: A Tutorial .
  4. News: Graves . Spencer B. . Dorai-Raj . Sundar . Creating R Packages, Using CRAN, R-Forge, And Local R Archive Networks And Subversion (SVN) Repositories .