Question: fitting model to a read coverage spectra
gravatar for Damian Kao
3.7 years ago by
Damian Kao15k
Damian Kao15k wrote:

K-mer spectra of genomic reads are usually modeled as a poisson mixture process where each of the components represents the variety of ploidies of a genome. So a k-mer spectra that looks like the following can be interpreted as showing a poisson component representing heterozygous regions (1N) and a second poisson component representing homozygous regions.


I've been looking at the "read coverage" spectra of assembled contigs recently. This is done by:

  1. aligning my reads (treated everything as single end) to the assembled contigs (assembled with SGA, not a de bruijn based assembler) with bwa mem
  2. Read coverage of each contig was determined by sum length of reads aligning / length of contig
  3. Only contigs larger than 500 bp was used for fitting mixture distributions. But all contigs were used in mapping.

I noticed that a mixture poisson doesn't fit all that well to the coverage data. The distribution of the components have a sloping shoulder to the left of the peaks.


There are obvious big differences between k-mer and read-coverage spectras. I am thinking the reason why poisson doesn't fit well to this is because assembled contigs have potential overlapping ends that were not assembled together due to ambiguity in the OLC graph. These overlapping ends result in multi-mapping reads getting discarded leading to lower read coverages.

Has anyone encountered this before and what other reasons do you think can be causing this? Is this a technical issue in the way I mapped/assembled the reads? Or the way I calculated coverage?

ADD COMMENTlink written 3.7 years ago by Damian Kao15k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 984 users visited in the last hour