Question: how to de novo assemble a large number of bacterial genome with spades in Linux
0
gravatar for haomingju
8 months ago by
haomingju0
haomingju0 wrote:

Hi, I am a freshman in sequencing data analysis. When i have one fastq file for only one bacteria , i know how to assemble using Spades. For example, "spades.py --pe1-1 name.fq.gz --pe1-2 name.fq.gz -o spades_test". But I don't know how to deal with a large number of samples with one linux command. For example, when i have 10 fastq data (name1~name10), i won't like to assemble them one by one by hand. Can you tell me how can i do ? Thanks!

assembly • 271 views
ADD COMMENTlink modified 8 months ago by the_cowa40 • written 8 months ago by haomingju0
1

Type bash loop in google.

ADD REPLYlink written 8 months ago by Rob120
1

Take a look at bash for loops.

Just putting these commands in a loop is not going to make these go any faster. If you have access to a cluster you could potentially use a for loop to submit 10 parallel spades jobs otherwise they will run one after the other.

ADD REPLYlink written 8 months ago by genomax89k
1

Do you have access to a HPC or computing cluster? You should up your skills and use submission scripts or pipelines to manage this.

ADD REPLYlink written 8 months ago by Asaf8.4k

You can do it with the help of shell

ADD REPLYlink modified 8 months ago • written 8 months ago by the_cowa40

spades.py --pe1-1 name.fq.gz --pe1-2 name.fq.gz -o spades_test" I guess this says that you have paired end, but fragmented reads. But you have only one fragment per end. I guess you can use -1 and -2 direct. There is also a problem with naming convention in OP.

ADD REPLYlink modified 8 months ago • written 8 months ago by cpad011214k
3
gravatar for the_cowa
8 months ago by
the_cowa40
the_cowa40 wrote:

You can do it with the help of shell

#!/bin/bash

for fol in "your fastq directory" ; do
echo $fol

for i in `ls $fol | tr "_" "\t" | cut -f4 | sort | uniq`; do
fitag=`ls $fol | grep $i | head -n1 | sed -e 's/L/\t/g' | cut -f1`
spades.py --pe1-1 $fol$fitag$i"_R1.fastq.gz"  --pe1-2 $fol$fitag$i"_R2.fastq.gz"  -o $fol$fitag$i".out"
done
done
ADD COMMENTlink written 8 months ago by the_cowa40

Although this solution works it should be avoided. As someone who spent a lot of time doing such things I can assure you that you will have to run this command more than once (a lot more actually), with different parameters, different datasets, maybe combine two samples (have I removed adapters?), you got the idea. You'll end up hacking this bash script in some unknown location, not sure which version of it you used to generate the results and when you'll write your manuscript you'll avoid sharing this code because it's, well, I'll say it. Ugly. What should you do? Make your results disposable. Save the input in a well documented, backed-up location and use pipelines to run the analysis, you can either use flowcraft for metagenomics assembly or craft your own. I can't stress this enough, learn how to use pipeline management systems like wdl, nextflow, snakemake, choose one, doesn't really matter which.

ADD REPLYlink written 8 months ago by Asaf8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1157 users visited in the last hour