Skip to content
Snippets Groups Projects
Matthias Bernt's avatar
Matthias Bernt authored
add suggestion for denylist_channels

See merge request !6
355fc95c
History

Recommendation for the safe use of conda

Problem

Anaconda, Inc. has (rather silently) changed the terms of service (TOS) for the defaults channel in 2020. This means that institutions with more than 200 employees (including academic) have to pay license fees if they make use of the defaults channel (which might be in the worst case 50$ per month

×\times
employees since the change of TOS). Anaconda, Inc. also started to actively enforce the new policy, e.g. against Intel.

This raised some uncertainty regarding the use of conda.

This document tries to give the needed background knowledge and a guideline that allows you to easily and safely use the free and open part of the conda ecosystem (i.e. avoid the defaults channel).

Solution

Bottom line is: Make sure that the defaults channel is not used and use other (equivalent) channels like conda-forge.

Double check which conda packages are installed

Likely the simplest advice is to double check which packages are installed. This will protect you even from misconfigurations.

  • Before even trying to install the software conda/mamba update the channel information (conda even explicitly shows the used channels as first output). Check the the defaults channel is not among the channels.
  • During installation the conda packages that will be installed will be listed. In the second column of this listing you find detailed information in the form CHANNEL/PLATFORM::PACKAGE. It seems to be good practice to double check this listing. Alternatively one could use the --dry-run argument.
  • conda environments can be created from environment.yaml files. These files include a channels list. Even if a properly configured conda installation (or arguments) should ensure that the defaults channel is not used it seems advisable to remove the defaults channel from this list if it is included.

In case installation still uses the defaults channel double check your configuration.

Which conda distribution should be used

  • The recommendation is to use miniforge which uses the conda-forge channel by default which is community maintained, free to use and only contains open source software.
  • micromamba is a good alternative for users that prefer mamba (but mamba is also included in miniforge).
  • We advise against the use of miniconda. Even if conda installed via miniconda can be configured to ignore the defaults channel, the defaults channel will be used for the construction of the base environment.
  • Using Anaconda is clearly not an option since all software comes from the defaults channel.

Switching conda distribution

In order to remove an old conda distribution (like miniconda/anaconda) you need to:

  • Remove the installation directory (e.g. ~/miniconda3/). Note that this typically includes your environments which you might want to keep (see below).
  • Make sure that your shell's configuration file does not contain any traces of the old conda distribution, i.e. a block starting with # >>> conda initialize >>> and ending with # <<< conda initialize <<< or any export statements or modifications of the PATH variable (only those that add conda paths). Potentially this can be done automatically with conda init --reverse.
  • The official documentation also contains a guide to transitioning from defaults to conda-forge

Check existing conda environments

  • Activate the conda environment and execute conda list. The last column shows the conda channel for each installed package. If this does not include the defaults channel or any of its subchannels (see below).
  • Alternatively (a more programmatic approach) might be to to check grep "\"url\": \"https://repo.anaconda.com" $ENVDIR/conda-meta/*json (where $ENVDIR is the directory where the environment is installed).
  • If the defaults channel has been used the environment should be re-created using only packages from proper channels.

conda vs `mamba

  • Both are free to use (of course only if installed from a free source).
  • mamba does not default to the defaults channel.
  • conda does not default to the defaults channel when installed via miniforge
  • Independent of the choice you need to double check the configurations if you have previously used conda / mamba or you install conda environments using exported yaml files.
  • Nowadays most features of conda (in particular the solver) have been integrated in mamba.
  • Nowadays conda and mamba are nearly equivalent, i.e. mamba covers most of conda's functionality and conda uses mamba's solver by default. Some aspects of mamba might be slightly faster. You can use aliases to retain script compatibility.

Check your configuration

  • Make sure that conda config --show-sources / conda config --show channels does not show the defaults channel.
  • If you find the defaults channel, remove it by executing conda config --remove channels defaults. If desired you can add conda-forge like so: conda config --add channels conda-forge
  • Just to be sure, one can explicit forbid the usage of the channels: conda config --append denylist_channels defaults, conda config --append denylist_channels main, conda config --append denylist_channels r
  • Make sure that override_channels_enabled is enabled: conda config --show override_channels_enabled should show True. If needed set it with conda config --set override_channels_enabled True.
  • It might be also a good idea to add nodefaults to the channel list, even if it's documented as equivalent to override_channels_enabled
  • If preferred one can specify --override-channels --channel conda-forge for conda install / conda create to achieve the same effect. Additional channels by be added by listing more --channel ZYZ arguments.

Details and background

Software (and in particular scientific software) is often difficult to install. Additionally, reproducible science requires independent installations of multiple versions of a software. anaconda aims to provide easy installation of important scientific software with proper version management. To this end the conda package manager has been created. It provides the possibility to install software independent of the programming language (it covers much more than only python and R) and operating system (it covers Linux, Mac and Windows).

At the source of the conda ecosystem there are conda recipes which contain information on metadata, requirements and installation instructions that are needed to build a package. These packages which are a pre-built archives containing software, metadata, and information on dependencies, ready to be installed into a Conda environment. These packages are stored in channels, i.e. repositories (online or local) where conda packages are stored and made available for installation. Many channels are free and contain only free and open source software - most importantly conda-forge and bioconda. The defaults channel consists of conda packages that are maintained by Anaconda, Inc. It consists of several subchannels: pkgs/main, pkgs/r pkgs/pro, pkgs/msys2, pkgs/free, pkgs/archive. Most conda channels are currently hosted by Anaconda Inc (if one is looking for risks in using conda then this might be one since its a single point of failure). Locally conda packages can be installed in environments which will install the package and its requirements.

A conda distribution is an installable software that provides conda (and all its requirements needed to run conda), e.g. mini-forge.

Further discussion

Other channels

Software can be dangerous. Hence besides licensing questions, installing software requires trust into the source of the software. Conda's search functionality allows you to find software in various channels. It seems not wise to blindly install software from all these channels.

It seems to be a good idea to rely on community maintained channels that include only free software. Examples for such channels are conda-forge and bioconda.

The larger picture

Beyond the problem with Anaconda's defaults channel, it is worth to mention that currently reproducible science relies to a significant extend on the availability to freely use services offered by companies, see discussion here. It's easy to blame Anaconda Inc for the TOS change, but maybe a bit of appreciation might be appropriate, but communication certainly might have been better.

Also note that Anaconda Inc. continues to provide substantial resources for free to everyone (and in particular the scientific community).

  • The software conda (which is also maintained by Anaconda, Inc) is still (and will always be) free to use and open source.

  • Hosting of the packages of all conda channels. The open source community currently has no possibilities to host these packages elsewhere (see).

  • Also note that part of the revenues are invested back in open source projects.

Note that Helmholtz (HiFiS) made some important steps toward scientific infrastructure, e.g. by providing container registries. Probably more investments of governmental and scientific bodies in open source and scientific infrastructure are needed.

Discussion of other documents

Recently UFZ and HiFiS published recommendations.

While these documents are certainly important to raise awareness, they contain a few points that are worth discussing

The company Anaconda provides very convenient software development environments and libraries for Python and R, especially for beginners. Due to their many years of free availability for research and educational institutions, they are also very popular in training and in the discussion on StackOverflow.

Partially correct, since the software is not provided by Anaconda, but only distributed. All (or almost all) software that is distributed in the defaults channel is FOSS software and distributed in many other (usually free) ways, e.g. conda-forge. Furthermore, it's important to highlight that these statements only apply to the software in the defaults channel. The possibilities offered by conda and channels like conda-forge go far beyond this.

  • Proper versioned software environments for all sorts of interpreted and compiled software.
  • An easy solution for reproducible science that can be used in large scale production ready deployments.

These fees are also due if employees of the respective institutions access the Anaconda software directories.

This could have been more precise, i.e. this only applies to the defaults channel (and its subchannels) and not (in general) to the other channels hosted by Anaconda Inc.

mamba instead of conda

As discussed above, with minimal effort, both can be used without using the problematic defaults channel.

Miniforge instead of conda-forge

This mixes two different concepts. miniforge is a conda distribution and conda-forge a conda channel. Both are part of the solution.

All other recommendations only apply to python (and are therefore no full alternative) or are unrelated to the actual question.

Also the Helmholtz Open Science newsletter recently talked about this topic

Users working in non-commercial or scientific organizations were able to use the software free of charge and were convinced that Anaconda was FOSS – Free and Open Source Software. In fact, this is not the case.

In fact all software that is distributed in Anaconda's defaults channel is FOSS software. The channel is only a non-free means of distributing the software in a way that is much more convenient than anything before conda was available. Nowadays, with conda-forge equally convenient and free ways are available.

The Jülich Supercomputing Centre's RSE Team has also uploaded a recommendation. Notably, the JSC typically encourages the use of virtual environments that are implemented in Python rather than a package manager such as conda, as it is better compatible with EasyBuild (including user modules).