Hi all,
I'm having some troubles to understand why I cannot download some read sets form a batch I launch using a script that invokes sra fastq-dump. It seems like it cannot connect to the NCBI, despite the code reported is correct. Does anyone have an idea? Following, a quote of the script:
> #!/bin/bash
> #
> #SBATCH --nodes=1 --ntasks=2 --cpus-per-task=24
> #SBATCH --time=24:00:00
> #SBATCH --mem=350gb
> #
> #SBATCH --job-name=SAS-EUR_populations
> #SBATCH --output=SAS-EUR_individuals.out
> #
> #SBATCH --partition=g100_usr_smem
> #SBATCH --account=IscrC_PanSV
>
> module load profile/bioinf sra/2.9.6
>
> cd /g100_work/IscrC_PanSV/NA20847
>
> fastq-dump --gzip SRR13606073
> fastq-dump --gzip SRR13606074
>
> cd /g100_work/IscrC_PanSV/NA20509
>
> fastq-dump --gzip SRR13606071
> fastq-dump --gzip SRR13606072
Sorry about the format but each fastq-dump is on a separate, new line as well as the #SBATCH. The interface outputs the quote in this strange format.
Thanks in advance,
Matteo
Save yourself the interaction with that terrible tool and use https://sra-explorer.info/ to get fastq download links directly. In my experience fastq-dump is rather unstable and experiences connection losses rather frequently.
Thanks a lot I'll have a look at that! But the problem seems I cannot work with wget because the thin nodes and the thick nodes for that cluster architecture do not have access to internet connection... So, I'm somehow forced to sra
Sorry I do not understand. If you have no internet, then how can you access ncbi via the toolkit?
Somehow I can use the toolkit on thin and thick nodes but not the wget command that abruptly stopped without downloading anything... So, I resorted to sra, which as you said is quite unstable.
I experimented with wget on the login nodes, which based on what the user support told me are the only ones connected to the web; however, the problem is that those nodes have a wall-time of 4h, and some of the files take longer time to download.
I think the answer might lie in what GenoMax said below that is the sequences are very "new" and I might pass through the .bam in order to get the .fastq.
If the sra is really the only way to go then try to download the SRA file first with
prefetch
and then convert to fastq withfastq-dump
locally rather than downloading withfastq-dump
directly. sra-explorer also offers aspera download links which you can try, also maybe see ifcurl
works better. Maybe it is just a server issue at NCBI with their ftp servers right now.