Question

how to de novo assemble a large number of bacterial genome with spades in Linux

0

Entering edit mode

4.3 years ago

haomingju • 0

Hi, I am a freshman in sequencing data analysis. When i have one fastq file for only one bacteria , i know how to assemble using Spades. For example, "spades.py --pe1-1 name.fq.gz --pe1-2 name.fq.gz -o spades_test". But I don't know how to deal with a large number of samples with one linux command. For example, when i have 10 fastq data (name1~name10), i won't like to assemble them one by one by hand. Can you tell me how can i do ? Thanks!

assembly • 1.4k views

ADD COMMENT • link updated 4.3 years ago by the_cowa ▴ 40 • written 4.3 years ago by haomingju • 0

1

Entering edit mode

Type bash loop in google.

ADD REPLY • link 4.3 years ago by Rob ▴ 150

1

Entering edit mode

Take a look at bash for loops.

Just putting these commands in a loop is not going to make these go any faster. If you have access to a cluster you could potentially use a for loop to submit 10 parallel spades jobs otherwise they will run one after the other.

ADD REPLY • link 4.3 years ago by GenoMax 141k

1

Entering edit mode

Do you have access to a HPC or computing cluster? You should up your skills and use submission scripts or pipelines to manage this.

ADD REPLY • link 4.3 years ago by Asaf 10k

0

Entering edit mode

You can do it with the help of shell

ADD REPLY • link 4.3 years ago by the_cowa ▴ 40

0

Entering edit mode

spades.py --pe1-1 name.fq.gz --pe1-2 name.fq.gz -o spades_test" I guess this says that you have paired end, but fragmented reads. But you have only one fragment per end. I guess you can use -1 and -2 direct. There is also a problem with naming convention in OP.

ADD REPLY • link 4.3 years ago by cpad0112 21k

score 3 · Answer 1 · 2020-01-21

3

Entering edit mode

4.3 years ago

the_cowa ▴ 40

You can do it with the help of shell

#!/bin/bash

for fol in "your fastq directory" ; do
echo $fol

for i in `ls $fol | tr "_" "\t" | cut -f4 | sort | uniq`; do
fitag=`ls $fol | grep $i | head -n1 | sed -e 's/L/\t/g' | cut -f1`
spades.py --pe1-1 $fol$fitag$i"_R1.fastq.gz"  --pe1-2 $fol$fitag$i"_R2.fastq.gz"  -o $fol$fitag$i".out"
done
done

ADD COMMENT • link 4.3 years ago by the_cowa ▴ 40

0

Entering edit mode

Although this solution works it should be avoided. As someone who spent a lot of time doing such things I can assure you that you will have to run this command more than once (a lot more actually), with different parameters, different datasets, maybe combine two samples (have I removed adapters?), you got the idea. You'll end up hacking this bash script in some unknown location, not sure which version of it you used to generate the results and when you'll write your manuscript you'll avoid sharing this code because it's, well, I'll say it. Ugly. What should you do? Make your results disposable. Save the input in a well documented, backed-up location and use pipelines to run the analysis, you can either use flowcraft for metagenomics assembly or craft your own. I can't stress this enough, learn how to use pipeline management systems like wdl, nextflow, snakemake, choose one, doesn't really matter which.

ADD REPLY • link 4.3 years ago by Asaf 10k