Hi people,

I am trying to assemble BAC clone sequence from pacbio. The assemble using Celera Assembler and canu are both resulting one contig assembly but with a difference in length of 15kb.

so, in order to estimate the target size. I followed the link below:

K-mer analysis and genome size estimate

and the graph obtained is as follows with two peaks and it is not giving following poisson distribution. I am confused which peak to choose for calculating the estimate target size. either of the peaks give two completely different target genome size estimates.

Can you link to the image of the distribution? If you're dealing with a diploid, you should probably use the second peak, but seeing the distribution would clarify things a lot. Also, what organism is it for?

it is the sequence of

B.moriBAC160That does not really look like 2 peaks to me, but rather, one jagged peak. Normally, for one peak, the genome size is the area under the curve excluding error kmers. But in this case there is no clear distinction. I agree with other comments that this is not really a good scenario to try kmer-based genome-size estimation.

