Question

Downloading all runs using fastq-dump

0

Entering edit mode

2.9 years ago

BDK_compbio ▴ 140

Is there a way to download all runs for SRA id like this https://www.ncbi.nlm.nih.gov/sra/?term=SRX724870 ? I manually searching the SRA on NCBI site and using fastq-dump for each one the runs. For example, I am running following three as fastq-dump -I --split-files SRX724870 gives errors.

fastq-dump -I --split-files SRR1602552
fastq-dump -I --split-files SRR1602553
fastq-dump -I --split-files SRR1602554

I have a list of SRA ids for which I am manually searching and running fastq-dump. It would be great of I can download all runs just using SRA id (e.g. SRX724870).

fastq-dump SRAtoolkit • 3.0k views

ADD COMMENT • link updated 2.9 years ago by Sukjun Kim ▴ 90 • written 2.9 years ago by BDK_compbio ▴ 140

0

Entering edit mode

Just enter the query at sra-explorer : find SRA and FastQ download URLs in a couple of clicks and get download links for fastq files right away.

ADD REPLY • link 2.9 years ago by ATpoint 82k

score 0 · Answer 1 · 2021-05-29

0

Entering edit mode

2.9 years ago

Sukjun Kim ▴ 90

I remember that automatic expansion of container accessions is not currently available in sratoolkit.

Why don't you try this short bash script?

It automatically retrieves all SRA accessions from SRX identifier and downloads corresponding runs.

#!/bin/bash

srx_id=SRX724870

sra_ids=$(wget -qO- "http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?save=efetch&db=sra&rettype=runinfo&term=${srx_id}" | grep ${srx_id} | cut -f1 -d",")

for sra_id in "${sra_ids[@]}"; do
    fastq-dump "${sra_id}"
done

ADD COMMENT • link 2.9 years ago by Sukjun Kim ▴ 90

0

Entering edit mode

It gives the following error

2021-06-10T05:07:51 fastq-dump.2.9.1 err: item not found while constructing within virtual database module - the path 'SRR1602552 SRR1602553 SRR1602554' cannot be opened as database or table

ADD REPLY • link 2.9 years ago by BDK_compbio ▴ 140

0

Entering edit mode

I think that the error has occurred because you wrote your code at the line 8 like this:

    fastq-dump "${sra_ids}"

It would have produced a command line below

$ fastq-dump "SRR1602552 SRR1602553 SRR1602554"

So you should fix the code using the variable ${sra_id} instead of using ${sra_ids}.

    fastq-dump "${sra_id}"

or it is also okay.

    fastq-dump ${sra_id}

It will produces a bunch of command lines as follows:

$ fastq-dump SRR1602552
$ fastq-dump SRR1602553
$ fastq-dump SRR1602554

I hope you solve the problem.

ADD REPLY • link 2.9 years ago by Sukjun Kim ▴ 90

score 0 · Answer 2 · 2021-05-29

0

Entering edit mode

2.9 years ago

Gregor Rot ▴ 540

Also this script using e-utilities should work:

if ! type "efetch" > /dev/null; then
  print "Please install E-utilitie."
fi
GSM=$1
! type "foo" > /dev/null 2>&1;
echo $GSM retrieves from NCBI GEO.....
all_data=`esearch -db sra -query $GSM |efetch -format docsum |xtract -pattern DocumentSummary -element Run@acc`
for SRR in ${all_data}
do
  echo "processing" $SRR
  fastq-dump -A $SRR
done

ADD COMMENT • link 2.9 years ago by Gregor Rot ▴ 540

0

Entering edit mode

I am afraid that I may not be doing things correctly. I just copied the script into script1.sh and run sh script1.sh and it gives the following error

script1.sh: line 1: type: efetch: not found
script1.sh: line 2: print: command not found
retrieves from NCBI GEO.....
script1.sh: line 7: esearch: command not found
script1.sh: line 7: efetch: command not found
script1.sh: line 7: xtract: command not found

ADD REPLY • link 2.9 years ago by BDK_compbio ▴ 140