Question: SOAP DENOVO outfile (.scafSeq)
gravatar for preranakoti93
2.5 years ago by
preranakoti930 wrote:


I have used SOAP DENOVO to assemble pytophthora genome. In .scafSeq file am geeting approximately 5,00,000 scaffolds.

In the headers of the .scafSeq file am getting,

C2943672 7.0

scaffold1 4.0

I want to know what "7.0 and 4.0" indicate? And these value range from 0 to 158. Should I filter my scaffolds based on these values or I should consider all the scaffolds of .scafSeq file?

Please let me know.

Thank you in Advance.

assembly genome • 1.1k views
ADD COMMENTlink modified 10 weeks ago by Biostar ♦♦ 20 • written 2.5 years ago by preranakoti930
gravatar for h.mon
2.5 years ago by
h.mon32k wrote:

C some number (e.g. C2943672) are singleton contigs, which could not be scaffolded. scaffold some number are, obviously, scaffolded contigs. The number after them is the kmer coverage depth of the contig or scaffold, which is related to base coverage.

You could indeed filter using the kmer coverage, plot a histogram and examine its distribution. I have usually found that contigs / scaffolds with much lower coverage than the average are contaminants (not the species being sequenced), and with much higher coverage, repetitive regions. But filtering by coverage alone is not a good practice, I would combine it with other explorations of the data, as is done in blobtools.

ADD COMMENTlink written 2.5 years ago by h.mon32k

Thank you for your time @h.mon :) It was useful.

ADD REPLYlink written 22 months ago by preranakoti930

Are singleton contigs represent not only contaminants but plasmid sequence? If then, it is fine that coverage of singleton contigs is too high.

ADD REPLYlink written 11 days ago by hansuol120
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1481 users visited in the last hour