Variant Call Format - Explanation Of Quality Scores?
1
2
Entering edit mode
12.9 years ago
Ian 6.0k

In variant call format (VCF) files produced at the end of the samtools mpileup variant detection pipeline there are two quality scores:

1) QUAL (col 6) = Phred based score that the variant shown in the ALT col is wrong.

2) INFO (col 8) MQ flag = RMA mapping quality

The two scores do not have a linear relationship. Some variants have a high mapping quality, but lower QUAL....

Can anyone describe how these two scores are created and how they are related?

I want to be able to filter variants based on these two scores, but do not fully understand what they mean. Should i filter based on both scores or just 'QUAL' for variants?

Thanks.

samtools vcf • 7.9k views
ADD COMMENT
1
Entering edit mode
12.9 years ago

mapping quality is assigned by the aligner and are wildly different from aligner to aligner. It is like e-val but less useful. Every aligner will assign a different mapping quality based on what it internally estimates the probability is that a read is accurately placed.

If you start filtering based on mapping quality you run the risk of biasing against SNPs that are in repeats or gene families.

ADD COMMENT
1
Entering edit mode

yes, this is called MAPQ in SAM and is arguably the least popular column

ADD REPLY
0
Entering edit mode

Just to clarify - you are talking about MQ, right? Thanks for replying, i couldn't find anything as useful on the VCF site (unless i missed it...).

ADD REPLY

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6