Question: SOAP DENOVO outfile (.scafSeq)
2.5 years ago by
preranakoti930 wrote:


I have used SOAP DENOVO to assemble pytophthora genome. In .scafSeq file am geeting approximately 5,00,000 scaffolds.

In the headers of the .scafSeq file am getting,

C2943672 7.0

scaffold1 4.0

I want to know what "7.0 and 4.0" indicate? And these value range from 0 to 158. Should I filter my scaffolds based on these values or I should consider all the scaffolds of .scafSeq file?

Please let me know.

Thank you in Advance.

assembly genome
written 2.5 years ago by preranakoti930
2.5 years ago by
h.mon32k wrote:

C some number (e.g. C2943672) are singleton contigs, which could not be scaffolded. scaffold some number are, obviously, scaffolded contigs. The number after them is the kmer coverage depth of the contig or scaffold, which is related to base coverage.

You could indeed filter using the kmer coverage, plot a histogram and examine its distribution. I have usually found that contigs / scaffolds with much lower coverage than the average are contaminants (not the species being sequenced), and with much higher coverage, repetitive regions. But filtering by coverage alone is not a good practice, I would combine it with other explorations of the data, as is done in blobtools.

written 2.5 years ago by h.mon32k

Thank you for your time @h.mon :) It was useful.

written 22 months ago by preranakoti930

Are singleton contigs represent not only contaminants but plasmid sequence? If then, it is fine that coverage of singleton contigs is too high.

written 11 days ago by hansuol120
