Repositories

R Package Repositories

Table of Contents


Every new R user quickly discovers that packages are how work gets done in R. To understand how to manage R packages, it is important to start by understanding where R packages come from. The majority of R users install packages from the Comprehenisve R Archive Network (CRAN). CRAN is a network of servers that distribute R along with R packages.

CRAN is not the only place where users can access packages. In addition to CRAN, there are a number of other CRAN-like repositories, such as R-Forge and BioConductor. These CRAN-like repositories have a similar structure to CRAN, and normally work with install.packages. Finally, some users access packages from locations such as GitHub. Unlike CRAN-like repositories, these locations require a different installation client.

To see your current repository setting, run:


getOption("repos")

In RStudio 1.2 and above, users can change their repository by going to Tools -> Global Options -> Packages. RStudio Server administrators have a number of options for setting a repository for users.

This page outlines the structure of a CRAN-like repository, covers specific details about CRAN, and discusses options for creating internal repositories.

Looking for a way to manage your own repository? Try RStudio Package Manager.

Structure of a CRAN-like Repository

CRAN-like repositories organize R packages in a specific structure designed to work with R’s functions for accessing and installing packages.

This structure can be seen in the file system of a CRAN-like repository, parts of which are highlighted below.


/
/src/contrib
  package_1.0.tar.gz
  PACKAGES.rds
  PACKAGES.gz
  PACKAGES
    /PATH
      package_1.2.tar.gz
    /Archive/
      /package/
        package_0.9.tar.gz
/bin
  /windows/contrib
    /3.3
    /3.4
      /PACKAGES
      /package_1.0.zip
  /macosx/
    /contrib
      /3.3
      /3.4
        /PACKAGES
        /package_1.0.tgz
    /mavericks
    /leopard
    /el-capitan

The structure of the repository is built around the different ways users might access packages. The /src/contrib directory contains the package source bundles. The /bin directory contains compiled packages, built for different distributions. More information about binary packages is available below.

Package Metadata

At the heart of a CRAN-like repository is a metadata file named PACKAGES. The metadata file enumerates what packages are available in the repository, as well as information about each package such as the packages name, version, and dependencies. The metadata file is available in three formats, not all three are required.

Archive Packages

CRAN, and some CRAN-like repositories, have an Archive directory inside of /src/contrib. This directory contains older versions of source packages, or packages that have been archived. On CRAN, there is additional metadata in an archive.rds file with information on the prior versions. These copies of prior package versions are critical to many reproducibility strategies.

Binary Packages

The /bin directory contains binary versions of R packages along with appropriate metadata. The binaries are organized by distribution. A binary package is an R package that has been installed onto a specific operating system. The installation process can include compiling code as well as package documentation and metadata. Binaries can be reused on similar operating systems, and are important because they allow users who want access to a package to install it much faster. For example, if you use R on a Windows desktop, and want access to ggplot2, you have two options. You could use install.packages('ggplot2', type = 'source'), in which case R would request the latest ggplot2 source bundle, e.g. /src/contrib/ggplot2_1.0.tar.gz. After downloading the bundle, R would unpack the source code and compile it. This process can take significant time. In contrast, the second and default option, install.packages('ggplot2'), instructs R to first try and download a binary, e.g. /bin/windows/contrib/ggplot2_1.0.zip. If the binary is available, R can download and use the code as-is, with no installation necessary. You can think of CRAN binaries as being a global cache, the work of compiling is done once (per operating system), and all users benefit!

More detailed information on package installation is available as well as information on the different states of an R package.

CRAN provides binaries for Windows and Mac. In addition to being distribution-specific, binaries are also specific to the version of R that created them. A repository serving binary packages indexes them by both attributes. Typically if a binary is not available for the version of R in-use, R will provide the user with an option to install the latest version from source, or install an older, binary version of the package that was built with the appropriate version of R. Additional details are available in the package installation section.

Characteristics Specific to CRAN

CRAN-like repositories share many of the structural attributes described above, but there are specific features of CRAN that make it unique and remarkably successful. Two notable features include a distribution network of CRAN mirrors and testing of submitted packages. CRAN documents these policies and all of its policies here.

CRAN Mirrors

CRAN distributes R and and R packages through a network of “mirror” servers. Currently, the majority of R users install packages from two unique mirrors: httpd://cloud.r-project.org and httpd://cran.rstudio.com. These two mirrors are actually CDNs that use many servers world-wide to distribute packages. The RStudio CRAN mirror includes download logs that can be used to analyze package data.

CRAN checks

A unique feature of CRAN is the package submission process. Unlike many language repositories, CRAN requires R packages to pass a series of tests before the packages are accepted into the repository. These checks test to ensure the package is correctly formatted, and notably, also check to ensure the package does not break any other current packages on CRAN. You can read details and advice on the CRAN checks as well as related advice on how to pick a package and how to think about using packages in validated environments.

Other CRAN-like Repositories

In addition to CRAN, there are a handful of other popular package repositories. The following list is not comprehensive.

BioConductor

BioConductor is a set of repositories containing R packages used in the analysis of genomic data. Three critical ways BioConductor differs from CRAN:

  1. BioConductor includes a number of data packages that are significantly larger than the max size limit for CRAN packages.

  2. BioConductor packages are all versioned and released together, as opposed to CRAN packages which are released individually on a rolling basis. This release mechanism is necessary because the packages are closely coupled.

  3. While BioConductor follows a CRAN-like structure, users are encourage to interact with the repository through a custom installation client instead of using install.packages directly.

BioConductor packages are often used alongside of CRAN packages.

R-Forge

R-Forge is a collection of R projects and packages. Many R-forge packages are available in the R-Forge repository and CRAN. R-Forge projects additionally have mailing lists, message boards, forums, and other options that provide details about the related packages.

rOpenSci

rOpenSci is a collection of packages that adhere to a stricter set of development standards, curated by a community of package maintainers and reviewers. The majority of rOpenSci packages are also available on CRAN, though they can be downloaded directly from a CRAN-like repository built directly on-top of the packages’ Git repositories using drat.

Internal Repositories

Many organizations find value in hosting their own package repository. Hosting an internal repository allows organizations to:

Internal repositories also play a critical role in many reproducibility strategies and can help teams collaborate on code. There are many ways organizations can create internal repositories. An open source option is the miniCRAN package. A professional, supported option is RStudio Package Manager. In addition to hosting an internal repository, RStudio Package Manager includes support for: