Question

Setting up environment for RNASeq on university HPCC CondaHTTPError: HTTP 000 CONNECTION FAILED

0

Entering edit mode

3.9 years ago

mahejabeen.nidhi ▴ 20

$ conda install -c bioconda fastqc
Collecting package metadata (current_repodata.json): failed

CondaHTTPError: HTTP 000 CONNECTION FAILED for url <https://conda.anaconda.org/bioconda/linux-64/current_repodata.json>
Elapsed: -

An HTTP error occurred when trying to retrieve this URL.
HTTP errors are often intermittent, and a simple retry will get you on your way.
ConnectionError(MaxRetryError("HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: /bioconda/linux-64/current_repodata.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object="" at="" 0x7fe37c3ff050="">: Failed to establish a new connection: [Errno 101] Network is unreachable'))"))

I am using the university's HPC system (details of it below) for RNASeq. However, as you can see above, I cannot download packages due to CondaHTTPError. How can I resolve this?

$ conda info

     active environment : None
       user config file : /home/mhnidhi2/.condarc
 populated config files : /home/mhnidhi2/.condarc
          conda version : 4.7.12
    conda-build version : 3.18.9
         python version : 3.7.4.final.0
       virtual packages : 
       base environment : /opt/ohpc/pub/anaconda3  (read only)
           channel URLs : https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /opt/ohpc/pub/anaconda3/pkgs
                          /home/mhnidhi2/.conda/pkgs
       envs directories : /home/mhnidhi2/.conda/envs
                          /opt/ohpc/pub/anaconda3/envs
               platform : linux-64
             user-agent : conda/4.7.12 requests/2.22.0 CPython/3.7.4 Linux/3.10.0-957.el7.x86_64 centos/7.6.1810 glibc/2.17
                UID:GID : 1311:1001
             netrc file : None
           offline mode : False

RNA-Seq HPC Conda Connection HTTP • 3.7k views

ADD COMMENT • link 3.9 years ago by mahejabeen.nidhi ▴ 20

1

Entering edit mode

You should talk to your cluster admin about the connection error.

ADD REPLY • link 3.9 years ago by ATpoint 82k

1

Entering edit mode

Another way to resolve these conda-related connection issues (if it isn't just an intermittent issue) is to build the environment in a Singularity container in your local machine (laptop/desktop) where you have control over the connection. Then move that container to HPC.

ADD REPLY • link 3.9 years ago by bruce.moran ▴ 960

0

Entering edit mode

That is an amazing idea! I think I will try to do that. I am new to working with HPCC. Do you have a git repo on building containers like that or other HPCC functions? Thank you again!

ADD REPLY • link 3.9 years ago by mahejabeen.nidhi ▴ 20

1

Entering edit mode

Do your self a favor and solve the underlying problem with your cluster admin. If you are going back and forth between a local machine and a HPC you have to push a new container to the HPC everytime you want to install a new package, this must be laborious.

ADD REPLY • link 3.9 years ago by ATpoint 82k

0

Entering edit mode

Very fair point. I already emailed the admin. Hopefully he can resolve this issue. Thank you!

ADD REPLY • link 3.9 years ago by mahejabeen.nidhi ▴ 20

2

Entering edit mode

If your cluster does not have direct/external internet access then many things will not work. Perhaps you need to use a proxy. Again this is local info you would need to find out.

ADD REPLY • link 3.9 years ago by GenoMax 141k

0

Entering edit mode

Resolving with admin is best course, my suggestion was an alternate in case that didn't work.

Re: labouriousness, I think it depends on your use case. If you have a pipeline you've already developed and know the required packages then containers are equally time-economical. For development a conda env is ideal, and that can be packaged inside a container once 'complete'. Benefits of that containerised version is in production/certification setting, and also specifically using NextFlow which takes advantage of HPC IME.

I haven't got any guides on how to containerise conda envs, I'll post one if you like?

ADD REPLY • link 3.9 years ago by bruce.moran ▴ 960