I used velvet to assemble genomic data of a plant and plotted a coverage histogram and a length weighted coverage histogram as suggested in the manual. Reads were 150 bp paired end, illumina. Various kmer values were tried and 115 was picked. What would be a good coverage cut off to use, considering that I have a small peak at 7. Please find 3 attachments. The expected coverage calculated by velvet is 23. When used with default coverage cut off (half of expected coverage), I get the following assembly:
Max length= 185793
Total = 362 MB
No. of contigs = 48,614
I wanted to use a lower cut off to include the kmers in the smaller peak. Hence, I tried using a coverage cut off of 3, to get the following:
Nodes = 513117
Max length =185793
Total = 384 MB
No. of contigs = 56,475
The expected genome size is 370-390 MB. Since it is expected to contain about 50-60% repeats, I do not expect the reads to cover my entire genome, which is also evident from my sam/bam files obtained by aligning reads to a closely related genome. I see that 10 MB is not covered.
Which among the two assemblies looks better??