Question: How to parallelize fastq-dump command when reading SRA IDs from a .txt file?
1
gravatar for bioinform
21 months ago by
bioinform20
bioinform20 wrote:

How to paralellize fastq-dump command when reading SRA IDs from a .txt file?

here is my working code without paralell, it downloads a pair of fastq files:

    list=`cat SRAIdFromPythonInput.txt` # list of the SRA record file  IDs.
     for i in $list
     do  echo $i
    ./fastq-dump --split-files $i -v
     done

How to rewrite it using parallel GNU to make it download all the data with SRA IDs written in .txt file, not a single pair of fastqs? How to apply pattern "cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output" to these codes?

shell paralell fastq-dump gnu sra • 1.7k views
ADD COMMENTlink modified 21 months ago by ole.tange3.4k • written 21 months ago by bioinform20

I'm too lazy to check/test: what would be the generated files for one given ID ?

ADD REPLYlink written 21 months ago by Pierre Lindenbaum119k

2 fastqs with SRA ids as the names

ADD REPLYlink written 21 months ago by bioinform20

what would be the names ? ID.fq.gz ? ID.fastq ? ID_R1.fq ? ID_R1.fastq.gz ?

ADD REPLYlink written 21 months ago by Pierre Lindenbaum119k

ID.fastq a pair of them, I use renaming code in the next step

SRR5656566_1.fastq and SRR5656566_2.fastq

ADD REPLYlink written 21 months ago by bioinform20
0
gravatar for Pierre Lindenbaum
21 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

using a Makefile

IDS=$(shell cat SRAIdFromPythonInput.txt)

%_2.fastq: %_1.fastq
    touch -c $@

%_1.fastq:
    ./fastq-dump --split-files $* -v && touch -c $@

all: $(addsuffix _2.fastq,$(IDS)) $(addsuffix _1.fastq,$(IDS))

invoke with make and the number of parallel jobs. e.g:

make -j 16
ADD COMMENTlink written 21 months ago by Pierre Lindenbaum119k

thank you for your efforts, could you please write these codes in a manner of the pattern of the GNU parallel: cat list | parallel "do-something1 {} config-{} ; do-something2 < {}" | process-output, why do you use Makefile? and is there any tutorial, article or a chapter on using it in bioinformatics? I have never used Makefile for NGS data processing. I found one at http://bsmith89.github.io/make-bml/

ADD REPLYlink modified 21 months ago • written 21 months ago by bioinform20
1

could you please write these codes in a manner of the pattern of the GNU parallel

no

why do you use Makefile?

because it works, it's easy , standard, ubiquitous, universal , etc...

ADD REPLYlink written 21 months ago by Pierre Lindenbaum119k

thanks, need code examples using GNU parallel, however,

ADD REPLYlink modified 21 months ago • written 21 months ago by bioinform20
0
gravatar for ole.tange
21 months ago by
ole.tange3.4k
Denmark
ole.tange3.4k wrote:

It is unclear to me what SRAIdFromPythonInput.txt contains. Can you give a couple of lines as example?

doit() {
  i="$1"
  echo "$i"
  ./fastq-dump --split-files $i -v
}
export -f doit
parallel doit :::: SRAIdFromPythonInput.txt
ADD COMMENTlink modified 21 months ago • written 21 months ago by ole.tange3.4k

It contains a column of SRA IDs:

 SRR5656566
 SRR5656567
 SRR5656518
 SRR5656500

thx

ADD REPLYlink modified 21 months ago • written 21 months ago by bioinform20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1096 users visited in the last hour