How to account for multiple mapping reads while calling variants?
1
0
Entering edit mode
8.2 years ago
kirannbishwa01 ★ 1.6k

I am using BWA for mapping. My main objective is to call polymorphisms from my genome resequenced data for two different populations for a plant model which has its reference genome sequenced.

If there are reads that align to multiple regions mainly from the paralogs (from duplicate genes/genomic regions, not for other repetitive regions) is it possible to make the read only align to the best match rather than matching it to two different places. - I want the polymorphism information at coding region not to be ambiguous although sequence error might contribute some mis-matching which I think will be less.

If I want to call variant on the alignments (SAM/BAM) which caller is best that will account for multiple mapped reads when scoring variants.

Thanks in advance!

SNP BWA alignment • 2.8k views
ADD COMMENT
1
Entering edit mode
8.2 years ago
Chris Fields ★ 2.2k

In many cases there isn't a 'best match'; some reads will simply align with equivalent edit distance to separate regions. This is a reality of shorter read length data, not much you can do about it.

Variant calls are generally derived from high quality alignment data (mapping qual is above a certain threshold, I think above 0 which is generally what multimapping reads are given). You can lower this threshold with most tools, e.g. see GATK MappingQualityFilter, or allow for lower quality variant calls. It will very likely give you tons of false positives you will have to sift through

ADD COMMENT

Login before adding your answer.

Traffic: 1943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6