Question: genome sequence from Spades output
0
gravatar for BCArg
5 months ago by
BCArg60
UAntwerpen
BCArg60 wrote:

We have sequenced two plasmids with a Novaseq machine (reads length of 100 bp, paired end).

For one of them, we have aligned the reads against the reference genome using BWA-mem and, as the reads mapped across the full length of the reference genome, we have extracted the sequence of the plasmid on IGV (with the feature extract consensus sequence):

reads mapped across whole reference

However, for the other sequence, the alignment on IGV showed that the reads mapped only against a portion of the reference genome.

reads mapping partially to reference

Therefore, it would be meaningless to extract the consensus sequence from IGV. We then sought to perform denovo assembly using Spades. We ran the following command

path/to/spades.py --careful -1 /path/Read1_001.fastq.gz -2 /path/toRead2_001.fastq.gz -o /path/to/outdir

The thing is that Spades outputs, among others, a scaffold.fasta and a contigs,fasta file. I was now wondering how (if) is it possible to get the continuous plasmid sequence from the Spades output?

alignment assembly genome • 235 views
ADD COMMENTlink modified 5 months ago by h.mon28k • written 5 months ago by BCArg60

Please see How to add images to a Biostars post to add your images properly. You need the direct link to the image, not the link to the webpage that has the image embedded (which is what you have used here)

ADD REPLYlink written 5 months ago by RamRS24k
1
gravatar for h.mon
5 months ago by
h.mon28k
Brazil
h.mon28k wrote:

The sequence of both contigs.fasta and scaffolds.fasta are nearly identical, the difference being the scaffolds.fasta possibly contains some contigs merged due to paired read mapping information. These scaffolds would contain some Ns indicating where is the gap in the sequence.

how (if) is it possible to get the continuous plasmid sequence from the Spades output?

The if depends on SPAdes assembling the plasmid into one continuous contig. There are several options as to how, one being:

  1. make a blast database with the contigs.fasta (and / or scaffolds.fasta),

  2. blast the reference plasmid against the database prepared in the previous step,

  3. filter and examine the blast hits.

ADD COMMENTlink written 5 months ago by h.mon28k

That's useful, though as the length of the scaffolds are shorter than the full sequence of the plasmid (obviously) I can only check whether the scaffolds sequence are correct. However the contiguous sequence i.e. how the scaffolds are connected to one another is still missing.

ADD REPLYlink written 5 months ago by BCArg60

I didn't state clearly, but I think the second plasmid is very different from the first and from the reference. The mapping suggests there are several deletions on the second plasmid compared to the first.

Assembling Illumina data will generally result in a lot more contigs than originally expected, one of the reasons is sequencing errors and contaminants. These can be filtered out by looking at the coverage, as the "proper" contigs will have a higher coverage than the garbage contigs.

ADD REPLYlink written 5 months ago by h.mon28k
0
gravatar for Buffo
5 months ago by
Buffo1.7k
Buffo1.7k wrote:

It looks like you do not have sequenced the entire sequence, so you can't get it.

ADD COMMENTlink written 5 months ago by Buffo1.7k

We have checked the sequencing parameters and they all look OK. So I am more tempted to believe that the sequenced of the sequenced plasmid is not the same as the reference.

ADD REPLYlink written 5 months ago by BCArg60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1822 users visited in the last hour