Question: How to run spades.py for multiple files(fastq) at once?
0
gravatar for subrahmanya
5 weeks ago by
subrahmanya0 wrote:

I have 218 fastq files i.e,(109*2), each of which have files named in the format as following

SRR8224532_1.fastq , SRR8224532_2.fastq ,
SRR8224533_1.fastq , SRR8224533_2.fastq,
.
.
.
.
.
.**(upto)**
.
.
.
.
.
.
SRR8224640_1.fastq, SRR8224640_2.fastq

For each SRR accession number two fastq files.

The problem is, i want to run spades.py as follows;

spades.py  -1 SRR8224532_1.fastq    -2  SRR8224532_2.fastq     -0  SRR8224532-out

Is it's possible to run spades.py for all 109 files (109*2) in the similar way as mentioned above?? Thank You in advance.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by subrahmanya0

Please pay attention to your post formatting. Also, avoid statements in ALL CAPS, as well as statements like "please help". It might help to read these 10 points on how to discuss topics on scientific communities: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

ADD REPLYlink written 5 weeks ago by RamRS24k
1
gravatar for RamRS
5 weeks ago by
RamRS24k
Houston, TX
RamRS24k wrote:

You are looking for parallelization. There are multiple levels of parallelization based on the computational resources you have.

You could use something like gnu parallel to submit each spades.py process to its own core. You could also use any internal parallel processing capabilities that spades.py may have so each process gets run faster.

If you have access to a cluster (HPC/cloud computing), you could use it to submit each process to separate nodes (computers). The exact code you will use depends on the resources you have access to.

At the least, you should be able to automate, if not parallelize. That would involve a simple loop that runs each process in series.

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by RamRS24k

Thank you for replying, Im running it on my laptop, and looking for a script/loop to run spades.py for all files at once

ADD REPLYlink written 5 weeks ago by subrahmanya0
1

A loop would be really simple. This is a good opportunity to learn to code. Translate the following pseudocode to code:

  1. Your loop should pick all _1.fastq files
  2. For each such file, create the command so:
    • -1 gets the original filename
    • -2 get the original filename with _1 replaced by _2
    • -o gets the original filename trimmed from the end up to the first _ character, then -out appended to the remaining string.

Everything you need for step #2 is here: https://wiki.bash-hackers.org/syntax/pe (under Substring removal and search and replace)

ADD REPLYlink written 5 weeks ago by RamRS24k

Okay thats sounds good Thank you

ADD REPLYlink written 5 weeks ago by subrahmanya0
1

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLYlink written 5 weeks ago by RamRS24k

Can you write a sample script/loop for the same?!

ADD REPLYlink written 5 weeks ago by subrahmanya0
1

Here is an example to get you started: Bash Script Loop Help

Try stuff out. Ask when/if you run into issues.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax73k

I tried something like this,

ls ${search_path} | while read line; do spades.py -1SRR8224532_1.fastq -2 SRR8224532_2.fastq -o SRR8224532-out done

its not working

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by subrahmanya0
1

Your filenanes would need to be different in each iteration of the loop, which they are not in your while loop above. A for loop would be easier for you right now.

Write down three to four individual commands and see how you can automate those commands by observing repeated patterns in each command. I have given you a detailed outline to follow, please put in some more effort.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by RamRS24k

Ok i will try that Thank you

ADD REPLYlink written 5 weeks ago by subrahmanya0

I tried something like this,

ls ${search_path} | while read line; do spades.py -1SRR8224532_1.fastq -2 SRR8224532_2.fastq -o SRR8224532-out done

its not working

ADD REPLYlink written 5 weeks ago by subrahmanya0

Before jumping into a loop run one assembly to make sure your laptop has adequate hardware available (mainly memory). Then you will want to serialize (run these assemblies one after the other, after first completes) rather than parallelize them. It appears these are all bacterial datasets?

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by genomax73k

Memory might cause problem... And yes, all are bacterial datasets

ADD REPLYlink written 5 weeks ago by subrahmanya0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2583 users visited in the last hour