Small viral contigs in denovo assembly using spades
1
0
Entering edit mode
14 months ago
mail2steff ▴ 70

Dear All,

I am trying spades to perform de novo assembly for my viral genomes. Unfortunately, these viral genomes have no references to compare with.

For some of the samples, I got 40kb to 90kp contigs. But for some, I get only 3000 bp to 5000 bp contigs. What could be the reason (both technical and biological reasons) behind this? If this is due to technical issues, how do I improve the assembly to get a contig with more kb?

I also check the log file but couldn't find anything alarming

Thank you in advance

Regards
Monica

denovo-assembly spades viral-genomes • 659 views
ADD COMMENT
0
Entering edit mode

Unless you only have completely novel virii (possible but there must be some known ones), there are bound to be things in database that are similar genomes. You can compare your contigs to NCBI Viral genome database to see how complete or what your contigs map to. That would give you an idea of quality of your assemblies.

ADD REPLY
3
Entering edit mode
14 months ago
Mensur Dlakic ★ 27k

There is a number of reasons for small contigs, so having some short contigs is normal. There is a good chance that you didn't do anything wrong, and that your assembly can't be improved. Your 40-90 Kb contigs indicate that you probably captured at least a couple of good viral genomes.

Depending on your DNA isolation and prep, it could be that some short contigs are plasmids or other small DNA fragments. It could be that you have several viruses with high similarity (but not identity) in a given region such that the assembler gets conflicting read information and can't extend the contigs. Some DNA pieces are not amenable to sequencing for reasons we do not understand. There are other reasons but hopefully you get the idea without listing them all. All of these would be in a non-fixable category.

We don't know the average sequencing depth of your sample, but extremely deep sequencing runs (> 1000x ) will often give fragmented assemblies because of accumulated sequencing errors. This would be one option where you could fix your assembly by error correction and subsampling to smaller depth values, say 100-500x.

ADD COMMENT

Login before adding your answer.

Traffic: 2474 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6