Skip to content
Snippets Groups Projects
README.md 13.1 KiB
Newer Older
Matthias Bernt's avatar
Matthias Bernt committed
# Recommendation for the safe use of conda
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
## Problem
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Anaconda, Inc. has (rather silently) changed the terms of service ([TOS](https://www.anaconda.com/pricing/terms-of-service-faqs)) for the `defaults` channel in 2020. This means that institutions with more than 200 employees (including academic) have to pay license fees if they make use of the `defaults` channel (which might be in the worst case 50$ per month $\times$ employees since the change of TOS). Anaconda, Inc. also started to actively enforce the new policy, e.g. [against Intel](https://www.courtlistener.com/docket/69029637/anaconda-inc-v-intel-corporation/).
Matthias Bernt's avatar
Matthias Bernt committed

This raised some uncertainty regarding the use of conda.
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
This document tries to give the needed background knowledge and a guideline that allows you to easily and safely use the free and open part of the conda ecosystem (i.e. avoid the `defaults` channel).
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
## Solution
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Bottom line is: Make sure that the `defaults` channel is not used and use other (equivalent) channels like `conda-forge`.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Double check which conda packages are installed
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Likely the simplest advice is to double check which packages are installed. This will protect you even from misconfigurations.
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
- Before even trying to install the software `conda`/`mamba` update the channel information (`conda` even explicitly shows the used channels as first output). Check the the `defaults` channel is not among the channels.
Matthias Bernt's avatar
Matthias Bernt committed
- During installation the conda packages that will be installed will be listed. In the second column of this listing you find detailed information in the form `CHANNEL/PLATFORM::PACKAGE`. It seems to be good practice to double check this listing. Alternatively one could use the `--dry-run` argument.
- `conda` environments can be created from `environment.yaml` files. These files include a `channels` list. Even if a properly configured conda installation (or arguments) should ensure that the `defaults` channel is not used it seems advisable to remove the `defaults` channel from this list if it is included.
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
In case installation still uses the `defaults` channel double check your configuration.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Which conda distribution should be used
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
- The recommendation is to use [`miniforge`](https://github.com/conda-forge/miniforge) which uses the `conda-forge` channel by default which is community maintained, free to use and only contains open source software.
Matthias Bernt's avatar
Matthias Bernt committed
- `micromamba` is a good alternative for users that prefer `mamba` (but `mamba` is also included in `miniforge`).
Ronny Gey's avatar
Ronny Gey committed
- We advise against the use of `miniconda`. Even if `conda` installed via `miniconda` can be configured to ignore the `defaults` channel, the `defaults` channel will be used for the construction of the `base` environment.
Matthias Bernt's avatar
Matthias Bernt committed
- Using `Anaconda` is clearly not an option since all software comes from the `defaults` channel.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Switching conda distribution
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
In order to remove an old conda distribution (like miniconda/anaconda) you need to:
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
- Remove the installation directory (e.g. `~/miniconda3/`). Note that this typically includes your environments which you might want to keep (see below).  
Matthias Bernt's avatar
Matthias Bernt committed
- Make sure that your shell's configuration file does not contain any traces of the old conda distribution, i.e. a block starting with `# >>> conda initialize >>>` and ending with `# <<< conda initialize <<<` or any `export` statements or modifications of the `PATH` variable (only those that add conda paths). Potentially this can be done automatically with `conda init --reverse`.
- The official documentation [also contains a guide to transitioning from `defaults` to `conda-forge`](https://conda-forge.org/docs/user/transitioning_from_defaults/)
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Check existing `conda` environments
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
- Activate the conda environment and execute `conda list`. The last column shows the `conda` channel for each installed package. If this does not include the `defaults` channel or any of its subchannels (see below).
- Alternatively (a more programmatic approach) might be to to check `grep "\"url\": \"https://repo.anaconda.com" $ENVDIR/conda-meta/*json` (where `$ENVDIR` is the directory where the environment is installed).
- If the `defaults` channel has been used the environment should be re-created using only packages from proper channels.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### `conda` vs `mamba
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
- Both are free to use (of course only if installed from a free source).
Matthias Bernt's avatar
Matthias Bernt committed
- `mamba` does not default to the `defaults` channel.
Ronny Gey's avatar
Ronny Gey committed
- `conda` does not default to the `defaults` channel when installed via [miniforge](https://github.com/conda-forge/miniforge/blob/e733f7bbc41f42551e9f02766d8a0301b72fde26/README.md?plain=1#L7-L8)
Matthias Bernt's avatar
Matthias Bernt committed
- Independent of the choice you need to double check the configurations if you have previously used `conda` / `mamba` or you install conda environments using exported yaml files.
Matthias Bernt's avatar
Matthias Bernt committed
- Nowadays most features of `conda` (in particular the solver) have been integrated in `mamba`.
- Nowadays `conda` and `mamba` are nearly equivalent, i.e. `mamba` covers most of `conda`'s functionality and `conda` uses `mamba`'s solver by default. Some aspects of `mamba` might be slightly faster. You can use aliases to retain script compatibility.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Check your configuration
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
- Make sure that `conda config --show-sources` / `conda config --show channels` does not show the `defaults` channel.
- If you find the `defaults` channel, remove it by executing `conda config --remove channels defaults`. If desired you can add `conda-forge` like so: `conda config --add channels conda-forge`
- Just to be sure, one can explicit forbid the usage of the channels: `conda config --append denylist_channels defaults`, `conda config --append denylist_channels main`, `conda config --append denylist_channels r`
Matthias Bernt's avatar
Matthias Bernt committed
- Make sure that `override_channels_enabled` is enabled: `conda config --show override_channels_enabled` should show `True`.
If needed set it with `conda config --set override_channels_enabled True`.
- It might be also a good idea to add [`nodefaults`](https://docs.conda.io/projects/conda/en/4.6.1/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually) to the channel list, even if it's documented as equivalent to `override_channels_enabled`
Matthias Bernt's avatar
Matthias Bernt committed
- If preferred one can specify `--override-channels --channel conda-forge` for `conda install` / `conda create` to achieve the same effect. Additional channels by be added by listing more `--channel ZYZ` arguments.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
## Details and background
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Software (and in particular scientific software) is often difficult to install. Additionally, reproducible science requires independent installations of multiple versions of a software. `anaconda` aims to provide easy installation of important scientific software with proper version management. To this end the `conda` package manager has been created. It provides the possibility to install software independent of the programming language (it covers much more than only python and R) and operating system (it covers Linux, Mac and Windows).
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
At the source of the conda ecosystem there are conda *recipes* which contain information on metadata, requirements and installation instructions that are needed to build a package. These `packages` which are a pre-built archives containing software, metadata, and information on dependencies, ready to be installed into a Conda environment. These packages are stored in
Matthias Bernt's avatar
Matthias Bernt committed
`channels`, i.e. repositories (online or local) where conda packages are stored and made available for installation. Many channels are free and contain only free and open source software - most importantly `conda-forge` and `bioconda`.
Matthias Bernt's avatar
Matthias Bernt committed
The `defaults` channel consists of conda packages that are maintained by Anaconda, Inc. It consists of several [subchannels](https://docs.anaconda.com/working-with-conda/reference/default-repositories/): `pkgs/main`, `pkgs/r` `pkgs/pro`, `pkgs/msys2`, `pkgs/free`, `pkgs/archive`.
Ronny Gey's avatar
Ronny Gey committed
Most conda channels are currently hosted by Anaconda Inc (if one is looking for risks in using `conda` then this might be one since its a single point of failure).
Locally conda `packages` can be installed in environments which will install the package and its requirements.
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
A `conda` distribution is an installable software that provides `conda` (and all its requirements needed to run conda), e.g. `mini-forge`.
Matthias Bernt's avatar
Matthias Bernt committed


Matthias Bernt's avatar
Matthias Bernt committed
## Further discussion
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Other channels
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
Software can be dangerous. Hence besides licensing questions, installing software requires trust into the source of the software. Conda's search functionality allows you to find software in various channels. It seems not wise to blindly install software from all these channels.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
It seems to be a good idea to rely on community maintained channels that include only free software. Examples for such channels are `conda-forge` and `bioconda`.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### The larger picture
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Beyond the problem with Anaconda's `defaults` channel, it is worth to mention that currently reproducible science relies to a significant extend on the availability to freely use services offered by companies, see discussion [here](https://galaxyproject.org/news/2024-08-20-opinion-conda/).
Ronny Gey's avatar
Ronny Gey committed
It's easy to blame Anaconda Inc for the TOS change, but maybe a bit of appreciation might be appropriate, but communication certainly [might have been better](https://www.theregister.com/2024/08/08/anaconda_puts_the_squeeze_on/).
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Also note that Anaconda Inc. continues to [provide substantial resources](https://galaxyproject.org/news/2024-08-20-opinion-conda/) for free to everyone (and in particular the scientific community).
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
- The software `conda` (which is also maintained by Anaconda, Inc) is still ([and will always be](https://www.anaconda.com/pricing/terms-of-service-faqs)) free to use and [open source](https://github.com/conda/conda/blob/main/LICENSE).
Matthias Bernt's avatar
Matthias Bernt committed
- Hosting of the packages of all `conda` channels. The open source community currently has no possibilities to host these packages elsewhere ([see](https://conda-forge.org/blog/2020/11/20/anaconda-tos/)).

Ronny Gey's avatar
Ronny Gey committed
- Also note that part of the revenues are [invested back](https://www.anaconda.com/blog/sustaining-our-stewardship-of-the-open-source-data-science-community) in open source projects.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
Note that Helmholtz (HiFiS) made some important steps toward scientific infrastructure, e.g. by providing container registries. Probably more investments of governmental and scientific bodies in open source and scientific infrastructure are needed.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
### Discussion of other documents
Matthias Bernt's avatar
Matthias Bernt committed

Matthias Bernt's avatar
Matthias Bernt committed
Recently [UFZ](https://www.intranet.ufz.de/index.php?de=31339&nb_item=2803) and [HiFiS](https://www.hifis.net/news/2024/10/15/anaconda-licensing.html) published recommendations.
Matthias Bernt's avatar
Matthias Bernt committed

While these documents are certainly important to raise awareness, they contain a few points that are worth discussing

> The company Anaconda provides very convenient software development environments and libraries for Python and R, especially for beginners. Due to their many years of free availability for research and educational institutions, they are also very popular in training and in the discussion on StackOverflow.

Matthias Bernt's avatar
Matthias Bernt committed
Partially correct, since the software is not provided by Anaconda, but only distributed. All (or almost all) software that is distributed in  the `defaults` channel is FOSS software and distributed in many other (usually free) ways, e.g. `conda-forge`.
Matthias Bernt's avatar
Matthias Bernt committed
Furthermore, it's important to highlight that these statements only apply to the software in the `defaults` channel. The possibilities offered by `conda` and channels like `conda-forge` go far beyond this. 
Matthias Bernt's avatar
Matthias Bernt committed

- Proper versioned software environments for all sorts of interpreted and compiled software.
- An easy solution for reproducible science that can be used in large scale production ready deployments.

Ronny Gey's avatar
Ronny Gey committed
> These fees are also due if employees of the respective institutions access the Anaconda software directories.
Matthias Bernt's avatar
Matthias Bernt committed

Ronny Gey's avatar
Ronny Gey committed
This could have been more precise, i.e. this only applies to the `defaults` channel (and its subchannels) and not (in general) to the other channels hosted by Anaconda Inc.
Matthias Bernt's avatar
Matthias Bernt committed

> mamba instead of conda

Ronny Gey's avatar
Ronny Gey committed
As discussed above, with minimal effort, both can be used without using the problematic `defaults` channel.
Matthias Bernt's avatar
Matthias Bernt committed

> Miniforge instead of conda-forge

This mixes two different concepts. `miniforge` is a conda distribution and `conda-forge` a conda channel. Both are part of the solution.

Matthias Bernt's avatar
Matthias Bernt committed
All other recommendations only apply to python (and are therefore no full alternative) or are unrelated to the actual question.
Matthias Bernt's avatar
Matthias Bernt committed
Also the Helmholtz Open Science newsletter recently [talked about this topic](https://os.helmholtz.de/en/newsroom/newsletter/106th-newsletter/#c124408)

> Users working in non-commercial or scientific organizations were able to use the software free of charge and were convinced that Anaconda was FOSS – Free and Open Source Software. In fact, this is not the case. 

In fact all software that is distributed in Anaconda's `defaults` channel is FOSS software. The channel is only a non-free means of distributing the software in a way that is much more convenient than anything before `conda` was available.
Nowadays, with `conda-forge` equally convenient and free ways are available.

Matthias Bernt's avatar
Matthias Bernt committed
The Jülich Supercomputing Centre's RSE Team has also [uploaded a recommendation](https://www.fz-juelich.de/en/rse/the_latest/the-anaconda-is-squeezing-us). Notably, the JSC typically encourages the use of virtual environments that are implemented in Python rather than a package manager such as `conda`, as it is better compatible with EasyBuild (including user modules).