Unable to download fastq files in parallel / SOS
0
0
Entering edit mode
28 days ago
j_eag • 0

Hi!

Very new to all this so bear with me if I'm using incorrect terminology. Also english is my second language.

I'm trying to download my fastq files in parallel but it doesn't work and I keep receiving this error:

fastq-dump.2.10.9 err: error unexpected while resolving query within virtual file system module - No accession to process ( 500 )

Does anyone have any suggestions? I can download fastq files individually but for an upcoming experiment will have over 50 files so I need to be able to download parallel.

MORE INFO:

I am doing everything on an hpc cluster within the scratch directory. I have a directory named rnaseqtrial and within it I have the accession list. Once I switch to a computing node I simply run :

for i in $(cat /scratch/eag88/trial/SRR_Acc_List.txt ); do sbatch download.sh ${i}; done

download.sh :

#!/bin/bash
#SBATCH --job-name=download
#SBATCH --mail-type=ALL
#SBATCH --mail-user=my email
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=4G

module load sra-tools/2.10.9
output_dir= “/scratch/eag88/rnaseqtrial /rawsamples"
mkdir -p $output_dir
fastq-dump —gzip —split-files $1 —outdir $output dir
slurm rna-seq bioinformatics fastq sequencing • 295 views
ADD COMMENT
2
Entering edit mode

Looks like you have an extra space in your output dir /scratch/eag88/rnaseqtrial /rawsamples in your script. Remove the space after rnaseqtrial and try again.

ADD REPLY
1
Entering edit mode

In addition, there is a missing underscore in the last line. should be:

      fastq-dump —gzip —split-files $1 —outdir $output_dir

Further, it is difficult to spot, but the following line contains the wrong quotes type, this might not be a problem or come from copy-pasting in the browser (or copying directly from word files or PDF), but better check if that is correct

      output_dir= “/scratch/eag88/rnaseqtrial /rawsamples"

      #should be: 

      output_dir="/scratch/eag88/rnaseqtrial/rawsamples"
ADD REPLY
1
Entering edit mode

SRA-toolkit supports the multithreading option, but it is for single files and not multiple files. I think you mean here batch download instead of parallel.

For batch download on HPC (SLURM job scheduler), you should consider using Nextflow SLURM executor or run each job separately or batch download of SRA files.

ADD REPLY
0
Entering edit mode

Thanks for the response! I'm following a course so I just followed what they did. Do you have an idea as to why mine wouldn't work?

I tried doing a batch download but kept running into errors and unfortunately don't have enough terminal knowledge for the nextflow slurm executor.

ADD REPLY
0
Entering edit mode

First, try to run the command with any accession such as fastq-dump --split-files SRR8296149 on compute node. If it works, then you have an issue with your script.

ADD REPLY

Login before adding your answer.

Traffic: 2095 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6