I am just trying to get my head around using conda environments.
I created a conda environment for a project containing plink2, plink, R and bcftools.
When I installed plink, using
mamba install -n autozygosity -c conda-forge plink, I got the output:
Package Version Build Channel Size ──────────────────────────────────────────────────────────────────────────────── Install: ──────────────────────────────────────────────────────────────────────────────── + plink 1.90b6.21 hec16e2b_2 bioconda/linux-64 7MB Change: ──────────────────────────────────────────────────────────────────────────────── - curl 7.86.0 h7bff187_1 conda-forge + curl 7.86.0 h2283fc2_1 conda-forge/linux-64 Cached - krb5 1.19.3 h3790be6_0 conda-forge + krb5 1.19.3 h08a2579_0 conda-forge/linux-64 Cached - libcurl 7.86.0 h7bff187_1 conda-forge + libcurl 7.86.0 h2283fc2_1 conda-forge/linux-64 Cached - libnghttp2 1.47.0 hdcd2b5c_1 conda-forge + libnghttp2 1.47.0 hff17c54_1 conda-forge/linux-64 Cached - libssh2 1.10.0 haa6b8db_3 conda-forge + libssh2 1.10.0 hf14f497_3 conda-forge/linux-64 Cached - python 3.11.0 h582c2e5_0_cpython conda-forge + python 3.11.0 ha86cf86_0_cpython conda-forge/linux-64 Cached - r-openssl 2.0.4 r42hfaab4ff_0 conda-forge + r-openssl 2.0.4 r42h1f3e0c5_0 conda-forge/linux-64 Cached Upgrade: ──────────────────────────────────────────────────────────────────────────────── - openssl 1.1.1s h166bdaf_0 conda-forge + openssl 3.0.7 h166bdaf_0 conda-forge/linux-64 Cached Downgrade: ──────────────────────────────────────────────────────────────────────────────── - bcftools 1.16 hfe4b78e_1 bioconda + bcftools 1.8 h4da6232_3 bioconda/linux-64 794kB - htslib 1.16 h6bc39ce_0 bioconda + htslib 1.9 h4da6232_3 bioconda/linux-64 1MB Summary: Install: 1 packages Change: 7 packages Upgrade: 1 packages Downgrade: 2 packages Total download: 9MB
This is kind of annoying since I would like to use some of the more recent features in htslib/bcftools and 1.8 is a pretty old version.
Are these kind of downgrades just a necessary part of using conda or is there something to be done? I guess one option is to create a new environment just for plink, but this seems like it could get messy quite quickly!
Thanks for the answer.
but many people have a seperate env for each tool- how would this work - can you load multiple environments at the same time?
No. You load one env. run the tool. Load the other env, run that tool etc.
THere is also now "conda run" I believe that will run a command in an env.
I think the most common thing is to have an env per commandline statement actually, rather than per tool.
Things like snakemake and Nextflow will even automate the building of an env for a particular step, and caching it for if its needed again.
I think of this as a "poor man's"
modulessystem. Something you can control without admin privileges.
I agree with this. I create an environment for each project and create additional environments if incompatibilities arise. Specifically, for each project I have a
requirements.txtwhere I list the packages I need and their version. As the project develops, I add or remove packages in
requirements.txtand install them with
mamba install --file requirements.txt -n my-project-env. One env per tool seems unmanageable to me but I'd like to hear other opinions...
conda activatehas a
--stackoption that could make that work but again, it seems messy to me.
This is how I do things. But I've now got to the point where the Envs for some of my projects take >90 minutes to build even with mamba, and often fail, so I might have to have a rethink.
It may be interesting to post one of such cases and see what people think... My projects usually have something like ~30 dependencies listed in requirements.txt. Sometimes envs break down and I need to rebuild them but it's usually a matter of minutes. Besides, I'm still not entirely settled on what should go in requirements.txt. For example, if you use ggplot2, do you also list R? (I don't)