SOAP DENOVO outfile (.scafSeq)
1
0
Entering edit mode
5.8 years ago

Hi,

I have used SOAP DENOVO to assemble pytophthora genome. In .scafSeq file am geeting approximately 5,00,000 scaffolds.

In the headers of the .scafSeq file am getting,

C2943672 7.0

scaffold1 4.0

I want to know what "7.0 and 4.0" indicate? And these value range from 0 to 158. Should I filter my scaffolds based on these values or I should consider all the scaffolds of .scafSeq file?

Please let me know.

Thank you in Advance.

assembly genome • 2.1k views
ADD COMMENT
0
Entering edit mode
5.7 years ago
h.mon 35k

C some number (e.g. C2943672) are singleton contigs, which could not be scaffolded. scaffold some number are, obviously, scaffolded contigs. The number after them is the kmer coverage depth of the contig or scaffold, which is related to base coverage.

You could indeed filter using the kmer coverage, plot a histogram and examine its distribution. I have usually found that contigs / scaffolds with much lower coverage than the average are contaminants (not the species being sequenced), and with much higher coverage, repetitive regions. But filtering by coverage alone is not a good practice, I would combine it with other explorations of the data, as is done in blobtools.

ADD COMMENT
0
Entering edit mode

Thank you for your time @h.mon :) It was useful.

ADD REPLY
0
Entering edit mode

Are singleton contigs represent not only contaminants but plasmid sequence? If then, it is fine that coverage of singleton contigs is too high.

ADD REPLY

Login before adding your answer.

Traffic: 1927 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6