Question: SOAP DENOVO outfile (.scafSeq)
gravatar for preranakoti93
18 months ago by
preranakoti930 wrote:


I have used SOAP DENOVO to assemble pytophthora genome. In .scafSeq file am geeting approximately 5,00,000 scaffolds.

In the headers of the .scafSeq file am getting,

C2943672 7.0

scaffold1 4.0

I want to know what "7.0 and 4.0" indicate? And these value range from 0 to 158. Should I filter my scaffolds based on these values or I should consider all the scaffolds of .scafSeq file?

Please let me know.

Thank you in Advance.

assembly genome • 679 views
ADD COMMENTlink modified 18 months ago by h.mon29k • written 18 months ago by preranakoti930
gravatar for h.mon
18 months ago by
h.mon29k wrote:

C some number (e.g. C2943672) are singleton contigs, which could not be scaffolded. scaffold some number are, obviously, scaffolded contigs. The number after them is the kmer coverage depth of the contig or scaffold, which is related to base coverage.

You could indeed filter using the kmer coverage, plot a histogram and examine its distribution. I have usually found that contigs / scaffolds with much lower coverage than the average are contaminants (not the species being sequenced), and with much higher coverage, repetitive regions. But filtering by coverage alone is not a good practice, I would combine it with other explorations of the data, as is done in blobtools.

ADD COMMENTlink written 18 months ago by h.mon29k

Thank you for your time @h.mon :) It was useful.

ADD REPLYlink written 10 months ago by preranakoti930
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 852 users visited in the last hour