Question

genome sequence from Spades output

0

Entering edit mode

4.8 years ago

BCArg ▴ 90

We have sequenced two plasmids with a Novaseq machine (reads length of 100 bp, paired end).

For one of them, we have aligned the reads against the reference genome using BWA-mem and, as the reads mapped across the full length of the reference genome, we have extracted the sequence of the plasmid on IGV (with the feature extract consensus sequence):

reads mapped across whole reference

However, for the other sequence, the alignment on IGV showed that the reads mapped only against a portion of the reference genome.

reads mapping partially to reference

Therefore, it would be meaningless to extract the consensus sequence from IGV. We then sought to perform denovo assembly using Spades. We ran the following command

path/to/spades.py --careful -1 /path/Read1_001.fastq.gz -2 /path/toRead2_001.fastq.gz -o /path/to/outdir

The thing is that Spades outputs, among others, a scaffold.fasta and a contigs,fasta file. I was now wondering how (if) is it possible to get the continuous plasmid sequence from the Spades output?

Assembly alignment genome • 3.0k views

ADD COMMENT • link updated 4.8 years ago by h.mon 35k • written 4.8 years ago by BCArg ▴ 90

0

Entering edit mode

Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here)

ADD REPLY • link 4.8 years ago by Ram 43k

score 1 · Answer 1 · 2019-06-19

1

Entering edit mode

4.8 years ago

h.mon 35k

The sequence of both contigs.fasta and scaffolds.fasta are nearly identical, the difference being the scaffolds.fasta possibly contains some contigs merged due to paired read mapping information. These scaffolds would contain some Ns indicating where is the gap in the sequence.

how (if) is it possible to get the continuous plasmid sequence from the Spades output?

The if depends on SPAdes assembling the plasmid into one continuous contig. There are several options as to how, one being:

make a blast database with the contigs.fasta (and / or scaffolds.fasta),
blast the reference plasmid against the database prepared in the previous step,
filter and examine the blast hits.

ADD COMMENT • link 4.8 years ago by h.mon 35k

0

Entering edit mode

That's useful, though as the length of the scaffolds are shorter than the full sequence of the plasmid (obviously) I can only check whether the scaffolds sequence are correct. However the contiguous sequence i.e. how the scaffolds are connected to one another is still missing.

ADD REPLY • link 4.8 years ago by BCArg ▴ 90

0

Entering edit mode

I didn't state clearly, but I think the second plasmid is very different from the first and from the reference. The mapping suggests there are several deletions on the second plasmid compared to the first.

Assembling Illumina data will generally result in a lot more contigs than originally expected, one of the reasons is sequencing errors and contaminants. These can be filtered out by looking at the coverage, as the "proper" contigs will have a higher coverage than the garbage contigs.

ADD REPLY • link 4.8 years ago by h.mon 35k

score 0 · Answer 2 · 2019-06-19

0

Entering edit mode

4.8 years ago

Buffo ★ 2.4k

It looks like you do not have sequenced the entire sequence, so you can't get it.

ADD COMMENT • link 4.8 years ago by Buffo ★ 2.4k

0

Entering edit mode

We have checked the sequencing parameters and they all look OK. So I am more tempted to believe that the sequenced of the sequenced plasmid is not the same as the reference.

ADD REPLY • link 4.8 years ago by BCArg ▴ 90