Question

Identifying reliable regions of reference genomes

0

Entering edit mode

20 months ago

pixie@bioinfo ★ 1.5k

Hello, we work a lot in identifying SNPs between varieties in the same plant species. We then design primers around these SNPs for variety detection. Some of the plant genomes can be very complex, or poorly assembled. By complexity, I mean the ploidy can change, etc. Ultimately, some of the SNP sets turn out to be inadequate for variety detection. Are there ways we can identify confident regions of the genome ?

Thanks

snp genomics • 582 views

ADD COMMENT • link updated 20 months ago by Dave Carlson ★ 1.7k • written 20 months ago by pixie@bioinfo ★ 1.5k

score 1 · Answer 1 · 2022-08-19

1

Entering edit mode

20 months ago

Dave Carlson ★ 1.7k

You might be interested in trying this tool:

https://github.com/RILAB/mop

From the README:

Simple tool for capturing alignment regions with sufficient quality for genotyping.

ADD COMMENT • link 20 months ago by Dave Carlson ★ 1.7k

score 0 · Answer 2 · 2022-08-19

Are the inadequacies of your SNP sets due to false SNPs themselves (i.e., potential mapping issues) or problems with SNPs underlying the primers? Do you do any sequencing of these plants yourselves?

If you have access to multiple sequences, you could adapt the Genome-In-A-Bottle methodology to define 'confident' callable regions: https://www.nature.com/articles/s41587-019-0074-6#Sec10 (the Illumina blog had a reasonable short alternative based solely on alignment: https://www.illumina.com/science/genomics-research/articles/identifying-genomic-regions-with-high-quality-single-nucleotide-.html).

If you do not have access to substantial sequencing, you could go back to the original assembly graph and visualize it in something like bandage, and regard long, contiguous, un-looped, "simple" subgraphs as being "high confidence"