Question: can discoSnp make the distinction between homeologous loci
gravatar for Yahan
17 months ago by
Yahan370 wrote:

In polyploid genomes with limited variation between subgenomes generally read mapping is a challenge leading to frequent read mismapping and hence calling homeologous variants, eg variations that are actually differences between the subgenomes.

When using discoSnp, you can imagine that these loci could also collapse and variants would appear heterozygous among all samples if the locus is non-branching. One could then use read frequency to decide if the variant is homeologous. One would expect to observe a 50/50 distribution of the two alleles in case of a tetraploid or a 25/75 distribution in case of a true variant as the non variant locus would contribute relatively more to the stack.

However when branching becomes more complex such an approach may become difficult and such variants might still end up in the same locus while actually originating from different loci. Are there any strategies that could be applied to recognize these cases and discern between the different loci?

Related to this, when does discoSnp decide that a graph becomes too complex and decides to split it into separate graphs.

Thanks for the reply.

snp homeologous loci discosnp • 452 views
ADD COMMENTlink modified 16 months ago by pierre.peterlongo840 • written 17 months ago by Yahan370
gravatar for pierre.peterlongo
16 months ago by
pierre.peterlongo840 wrote:

Hello Yahan,

Thanks for your question here.

Generally speaking, discoSnp confuses true variants with inter-repetition or inter-genome variations.

There are then several ways of sorting out the true and false positives calls.

  • One expects that true variants discriminate individuals and thus discriminate read sets. This is why we proposed the rank value, associated with each variant. In our experiments, we have shown that variants with rank < 0.2 are likely false positives, while most (>95%) other ones are true positives (see fig7 of the paper).
  • If variants are mapped on a reference genome (for instance using the VCFcreator tool integrated to disco), uniquely mapped variants are likely to be true positives (marked as PASS) while other ones (marked as MULTIPLE) are more likely due to inter-repetition or inter-genome variations.

Hoping this helps,


ADD COMMENTlink written 16 months ago by pierre.peterlongo840

Thanks for the clear explanation Pierre.

ADD REPLYlink written 16 months ago by Yahan370
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 568 users visited in the last hour