Question: Which peak is homozygous and heterozygous in Kmer plot for Genome estimation
0
gravatar for Prakki Rama
4.0 years ago by
Prakki Rama2.2k
Singapore
Prakki Rama2.2k wrote:

Hi all,

How do we know, which peak is homozygous and heterozygous when we generate a kmer plot for estimating genome size? Would be thankful to your directions.

kmer assembly genome • 3.0k views
ADD COMMENTlink modified 4.0 years ago by thackl2.6k • written 4.0 years ago by Prakki Rama2.2k
3
gravatar for thackl
4.0 years ago by
thackl2.6k
MIT
thackl2.6k wrote:

Assuming a diploid organism (and two peaks) , the heterozygous peak is the first peak, ideally at 1/2 the coverage of the second, hopefully larger, homozygous peak. This is simply because every homozygous site occurs in two alleles, while every heterozygous site only occurs in one allel, hence producing a signal at half the expected genome coverage

ADD COMMENTlink written 4.0 years ago by thackl2.6k

Thank you. But what about other small peaks appear in the plot after homozygous regions? They must be repetitive regions with higher coverage? Am I right?

ADD REPLYlink written 4.0 years ago by Prakki Rama2.2k
1

Yes, additional peaks after the C2-peak (diploid genome peak) represent regions with higher copy number such as repeats. However, for forming a peak, you need a larger region or many sequences of very similar copy numbers.

Repeats usually don't form a peak, as each repeat is small and different repeats have different copy numbers.

But for example, I've got a plot from a small genome with high gene content, with a small distinct peak at C4. This peak comprises duplicated gene families. Also mitochondrium and chloroplast produce their own peak at their respective coverage (Often 100-10000 times the genome coverage). Partial genome duplications or chromosome aberrations can produce additional distinct peaks as well. And also bacterial contaminations, symbionts and parasites might produce peaks.

You can estimate the "size" of a peak to get an idea of what it represents. Simply sum up the count*coverage of kmers in the peak region.

ADD REPLYlink written 4.0 years ago by thackl2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 644 users visited in the last hour