Batching fastq download w/sra-toolkit: Won't recognize IDs
1
1
Entering edit mode
15 months ago
harka14 ▴ 10

Hi all,

I'm new to bash scripting, and I'm trying to create a simple script to download fastq files from SRA using a list of SRR IDs in a txt file. (Eventually I'd like to parallelize this process, but for now I just need it to work.) I've used fastq-dump from sra-toolkit numerous times, but now I'm having issues with the batching that don't make sense to me.

I have successfully used the following command to download one of the datasets in this list:

~/bin/sratoolkit/bin/fastq-dump -O ~/output/location/01_fastq --gzip --skip-technical  --readids --read-filter pass --dumpbase --split-spot --clip SRRXXXXXXXX

My script, named 01_SRA, is:

BATCH=${1?Error: No batch ID given.}
LOC=${2?Error: No directory given.}

INPUT=${LOC}00_SRAlists/${BATCH}.txt
OUTLOC=${LOC}${BATCH}/01_fastq/

cat $INPUT | while read line
do
        echo Starting $line
        ~/bin/sratoolkit/bin/fastq-dump -O $OUTLOC --gzip --skip-technical --readids --read-filter pass --dumpbase --split-spot --clip $line
done

When I run

~/location/01_SRA batch002 ~/location/seq_data/

I get:

Starting SRRXXXXXXXX
2020-08-27T14:21:16 fastq-dump.2.10.8 err: error unexpected while resolving query within virtual file system module - No accession to process ( 500 )
Failed to call external services.

For each of the IDs in my txt file. This dumps out almost immediately. So based on the echo line, I know it is reading the ID, but somehow fastq-dump isn't able to process it.

I thought it might be a problem with my txt file. I created another file called batchtest.txt that just has the test dataset from the wiki three times:

SRR390728
SRR390728
SRR390728

When I run it:

~/location/01_SRA batchtest ~/tmp/

I don't get an error... (it pauses on the first ID to download)

As far as I can tell, nothing has changed except for the SRR IDs in the text file.

So, in summary:

  • Command line + test ID = works fine
  • Command line + my IDs = works fine
  • Script + test ID = works fine
  • Script + my IDs = ERROR

Does anybody know what might be causing this and what I can do to fix it?

Software info:

  • Ubuntu v18.04.3 LTS (Bionic Beaver)
  • GNU bash, v4.4.20(1)
  • sra-toolkit v2.10.8

Thanks

RNA-Seq sra-toolkit bash • 581 views
ADD COMMENT
2
Entering edit mode
15 months ago
Ram 35k

Can you examine the IDs using

echo "Starting $line" | cat -te

Maybe there are some invisible characters messing up your code.

ADD COMMENT
0
Entering edit mode

This was it! I've had to convert dos2unix in another context before, so I feel like I should have caught that 🙄 Re-creating the file in nano fixed it. Thank you!

ADD REPLY
0
Entering edit mode

I've moved my comment to an answer. Please go ahead and accept it using the green check mark on the left.

ADD REPLY

Login before adding your answer.

Traffic: 1693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6