Dplyr Explained

dplyr
dplyr
Author:Hadley Wickham, Romain François, Lionel Henry, Kirill Müller, Davis Vaughan
Latest Release Version:1.1.0
Programming Language:R
License:MIT License

dplyr is an R package whose set of functions are designed to enable dataframe (a spreadsheet-like data structure) manipulation in an intuitive, user-friendly way. It is one of the core packages of the popular tidyverse set of packages in the R programming language.[1] Data analysts typically use dplyr in order to transform existing datasets into a format better suited for some particular type of analysis, or data visualization.[2] [3]

For instance, someone seeking to analyze a large dataset may wish to only view a smaller subset of the data. Alternatively, a user may wish to rearrange the data in order to see the rows ranked by some numerical value, or even based on a combination of values from the original dataset. Functions within the dplyr package will allow a user to perform such tasks. dplyr was launched in 2014.[4] On the dplyr web page, the package is described as "a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges."[5]

The five core verbs

While dplyr actually includes several dozen functions that enable various forms of data manipulation, the package features five primary verbs or actions:[6]

Additional functions

In addition to its five main verbs, dplyr also includes several other functions that enable exploration and manipulation of dataframes. Included among these are:

Built-in datasets

The dplyr package comes with five datasets. These are: band_instruments, band_instruments2, band_members, starwars, storms.

Copyright & license

The copyright to dplyr is held by Posit PBC, formerly RStudio PBC. dplyr was originally released under a GPL license, but in 2022, Posit changed the license terms for the package to the "more permissive" MIT License.[7] The main difference between the two types of license is that the MIT license allows subsequent re-use of code within proprietary software, whereas a GPL license does not.

Notes and References

  1. Wickham . Hadley . Averick . Mara . Bryan . Jennifer . Chang . Winston . McGowan . Lucy D'Agostino . François . Romain . Grolemund . Garrett . Hayes . Alex . Henry . Lionel . Hester . Jim . Kuhn . Max . Pedersen . Thomas Lin . Miller . Evan . Bache . Stephan Milton . Müller . Kirill . 2019-11-21 . Welcome to the Tidyverse . Journal of Open Source Software . en . 4 . 43 . 1686 . 10.21105/joss.01686 . 2475-9066. free .
  2. Web site: Yadav. Rohit. 2019-10-29. Python's Pandas vs R's Tidyverse: Who Comes Out On Top?. 2021-02-06. Analytics India Magazine. en-US.
  3. Web site: Krill. Paul. 2015-06-30. Why R? The pros and cons of the R language. 2021-02-06. InfoWorld. en.
  4. Web site: Introducing dplyr. 2020-09-02. blog.rstudio.com. 17 January 2014 . en-us.
  5. Web site: Function reference. 2021-02-06. dplyr.tidyverse.org. en.
  6. Book: Grolemund. Garrett. 5 Data transformation R for Data Science. Wickham. Hadley.
  7. Web site: A Grammar of Data Manipulation. 2023-01-14. tidyverse.org. en.