Question: How to parallelize fastq-dump command when reading SRA IDs from a .txt file?
1
gravatar for bioinform
2.3 years ago by
bioinform20
bioinform20 wrote:

How to paralellize fastq-dump command when reading SRA IDs from a .txt file?

here is my working code without paralell, it downloads a pair of fastq files:

    list=`cat SRAIdFromPythonInput.txt` # list of the SRA record file  IDs.
     for i in $list
     do  echo $i
    ./fastq-dump --split-files $i -v
     done

How to rewrite it using parallel GNU to make it download all the data with SRA IDs written in .txt file, not a single pair of fastqs? How to apply pattern "cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output" to these codes?

shell paralell fastq-dump gnu sra • 2.1k views
ADD COMMENTlink modified 2.3 years ago by ole.tange3.6k • written 2.3 years ago by bioinform20

I'm too lazy to check/test: what would be the generated files for one given ID ?

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k

2 fastqs with SRA ids as the names

ADD REPLYlink written 2.3 years ago by bioinform20

what would be the names ? ID.fq.gz ? ID.fastq ? ID_R1.fq ? ID_R1.fastq.gz ?

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k

ID.fastq a pair of them, I use renaming code in the next step

SRR5656566_1.fastq and SRR5656566_2.fastq

ADD REPLYlink written 2.3 years ago by bioinform20
0
gravatar for Pierre Lindenbaum
2.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using a Makefile

IDS=$(shell cat SRAIdFromPythonInput.txt)

%_2.fastq: %_1.fastq
    touch -c $@

%_1.fastq:
    ./fastq-dump --split-files $* -v && touch -c $@

all: $(addsuffix _2.fastq,$(IDS)) $(addsuffix _1.fastq,$(IDS))

invoke with make and the number of parallel jobs. e.g:

make -j 16
ADD COMMENTlink written 2.3 years ago by Pierre Lindenbaum124k

thank you for your efforts, could you please write these codes in a manner of the pattern of the GNU parallel: cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output, why do you use Makefile? and is there any tutorial, article or a chapter on using it in bioinformatics? I have never used Makefile for NGS data processing. I found one at http://bsmith89.github.io/make-bml/

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinform20
1

could you please write these codes in a manner of the pattern of the GNU parallel

no

why do you use Makefile?

because it works, it's easy , standard, ubiquitous, universal , etc...

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum124k

thanks, need code examples using GNU parallel, however,

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinform20
0
gravatar for ole.tange
2.3 years ago by
ole.tange3.6k
Denmark
ole.tange3.6k wrote:

It is unclear to me what SRAIdFromPythonInput.txt contains. Can you give a couple of lines as example?

doit() {
  i="$1"
  echo "$i"
  ./fastq-dump --split-files $i -v
}
export -f doit
parallel doit :::: SRAIdFromPythonInput.txt
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by ole.tange3.6k

It contains a column of SRA IDs:

 SRR5656566
 SRR5656567
 SRR5656518
 SRR5656500

thx

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by bioinform20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1739 users visited in the last hour