Strategy Maps

Strategies to Reproduce Environments Over Time

Table of Contents


Reproducing data science work is the main objective of environment management. This site details three strategies for reproducing R environments over time. To select a strategy, you will need to answer two questions:

  1. Who is responsible for managing the environment?
  2. How open is the environment?

At first these two questions might seem similar, but separating the two uncovers common “danger zones” or “anti-strategies”. The map below depicts these danger zones as well as three successful strategies. Use the map and the two questions above to determine where your organization currently operates and identify which strategy to move towards.

The three strategies are outlined in detail:

In addition to these three strategies, the strategy map above details a set of danger zones, areas where “who” is in control and “what” can be installed are mis-aligned to create painful environments that can not be reliably recreated. Identifying if you’re in a danger zone can help you identify a “nearby” strategy to move towards.

Wild West

The wild west scenario occurs when users are given free reign to install packages with no strategy for reproducing package environments.

Recommendations:

Ticket System

The ticket system scenario occurs when administrators are involved in package installation, but they do not have a strategy for ensuring consistent and safe package updates; for example:

  1. A user wants a new package installed, so they submit a ticket to have the package added
  2. An admin receives the ticket, and manually installs the new package into the system library

This scenario is problematic because it encourages partial upgrades, is often slow, and still results in broken environments!

Recommendation

Blocked

The blocked scenario occurs when servers are locked down, but there is no strategy in place for R package access. This strategy often leads R users to “backdoor” approaches to package access, such as manually copying over installed packages.

In this scenario, it is important for R users to level-set with IT on why R packages are essential to successful data science work. You may need to refer to the validation section of the site or the section on picking packages, both of which help explain where packages come from and address issues around trust.

Come to this discussion prepared to advocate for either the shared baseline or validation strategy. It may also help your admin team to know that there are supported products, like RStudio Package Manager, designed to help them help you!