How to run spades.py for multiple files(fastq) at once?
1
0
Entering edit mode
4.6 years ago
Nyksubuz ▴ 10

I have 218 fastq files i.e,(109*2), each of which have files named in the format as following

SRR8224532_1.fastq , SRR8224532_2.fastq ,
SRR8224533_1.fastq , SRR8224533_2.fastq,
.
.
.
.
.
.**(upto)**
.
.
.
.
.
.
SRR8224640_1.fastq, SRR8224640_2.fastq

For each SRR accession number two fastq files.

The problem is, i want to run spades.py as follows;

spades.py  -1 SRR8224532_1.fastq    -2  SRR8224532_2.fastq     -0  SRR8224532-out

Is it's possible to run spades.py for all 109 files (109*2) in the similar way as mentioned above?? Thank You in advance.

assembly spades loop fastq sequence • 6.4k views
ADD COMMENT
0
Entering edit mode

Please pay attention to your post formatting. Also, avoid statements in ALL CAPS, as well as statements like "please help". It might help to read these 10 points on how to discuss topics on scientific communities: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002202

ADD REPLY
2
Entering edit mode
4.6 years ago
Ram 43k

You are looking for parallelization. There are multiple levels of parallelization based on the computational resources you have.

You could use something like gnu parallel to submit each spades.py process to its own core. You could also use any internal parallel processing capabilities that spades.py may have so each process gets run faster.

If you have access to a cluster (HPC/cloud computing), you could use it to submit each process to separate nodes (computers). The exact code you will use depends on the resources you have access to.

At the least, you should be able to automate, if not parallelize. That would involve a simple loop that runs each process in series.

ADD COMMENT
0
Entering edit mode

Thank you for replying, Im running it on my laptop, and looking for a script/loop to run spades.py for all files at once

ADD REPLY
1
Entering edit mode

A loop would be really simple. This is a good opportunity to learn to code. Translate the following pseudocode to code:

  1. Your loop should pick all _1.fastq files
  2. For each such file, create the command so:
    • -1 gets the original filename
    • -2 get the original filename with _1 replaced by _2
    • -o gets the original filename trimmed from the end up to the first _ character, then -out appended to the remaining string.

Everything you need for step #2 is here: https://wiki.bash-hackers.org/syntax/pe (under Substring removal and search and replace)

ADD REPLY
0
Entering edit mode

Okay thats sounds good Thank you

ADD REPLY
1
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Can you write a sample script/loop for the same?!

ADD REPLY
1
Entering edit mode

Here is an example to get you started: Bash Script Loop Help

Try stuff out. Ask when/if you run into issues.

ADD REPLY
0
Entering edit mode

I tried something like this,

ls ${search_path} | while read line; do spades.py -1SRR8224532_1.fastq -2 SRR8224532_2.fastq -o SRR8224532-out done

its not working

ADD REPLY
1
Entering edit mode

Your filenanes would need to be different in each iteration of the loop, which they are not in your while loop above. A for loop would be easier for you right now.

Write down three to four individual commands and see how you can automate those commands by observing repeated patterns in each command. I have given you a detailed outline to follow, please put in some more effort.

ADD REPLY
0
Entering edit mode

Ok i will try that Thank you

ADD REPLY
0
Entering edit mode

I tried something like this,

ls ${search_path} | while read line; do spades.py -1SRR8224532_1.fastq -2 SRR8224532_2.fastq -o SRR8224532-out done

its not working

ADD REPLY
0
Entering edit mode

Before jumping into a loop run one assembly to make sure your laptop has adequate hardware available (mainly memory). Then you will want to serialize (run these assemblies one after the other, after first completes) rather than parallelize them. It appears these are all bacterial datasets?

ADD REPLY
0
Entering edit mode

Memory might cause problem... And yes, all are bacterial datasets

ADD REPLY

Login before adding your answer.

Traffic: 1508 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6