Question: Compute Mapping quality for invariant sites
0
gravatar for elcortegano
12 weeks ago by
elcortegano40
elcortegano40 wrote:

Hi, I am having some issues with the VCF files generated from GATK caller, as they are not returning a mapping quality value for many positions, specially invariant sites.

Since the BAM files have a mapping quality score on every read, I am assuming that there is a way to get that value for every position without needing to use GATK. What are some alternatives? In a case where multiple samples are being used, do these MQ should be simply be averages between samples at every position?

In case you wonder how I am using GATK, I post relevant code below:

java -jar gatk HaplotypeCaller -I file.bam -O file.g.vcf -R reference.fa -ploidy 1 -ERC BP_RESOLUTION    
# The above is done for different input files
java -jar gatk CombineGVCFs -R reference.fa -O combined.g.vcf --variant file1.g.vcf --variant file2.g.vcf ...
java -jar gatk GenotypeGVCFs -R reference.fa -V combined.g.vcf -O variants.vcf -ploidy 1 -all-sites

For some reason, this results in many MQ values being absent from the final VCF file (as well as many QUAL values taking an Infinity value).

mapping-quality next-gen • 102 views
ADD COMMENTlink modified 11 weeks ago • written 12 weeks ago by elcortegano40
1
gravatar for elcortegano
11 weeks ago by
elcortegano40
elcortegano40 wrote:

In the end, what worked for me was switching the software. Now I am using freebayes, which does provide mapping qualities for all variant and invariant sites (e.g. using --report-monomorphic option).

ADD COMMENTlink written 11 weeks ago by elcortegano40
0
gravatar for swbarnes2
12 weeks ago by
swbarnes27.8k
United States
swbarnes27.8k wrote:

A read's mapping quality is "how likely is it that this read's origin has been correctly determined?". A base's quality within a read is "how likely is it that this base is what we say it is?". Quality for a variant is "how likely is it that this base is not homozygous reference?", which is taking into account mapping quality and individual base quality.

I don't think it makes sense to worry about the mapping quality alone at a variant locus. And I don't think it makes sense for there to be a quality score for a locus that is homozygous reference.

ADD COMMENTlink written 12 weeks ago by swbarnes27.8k

It makes sense if you need to consider all variant and invariant sites but you want to restrict the analyses to well aligned sites. We are working with mutation accumulation data, so as you can imagine most of the genome is invariant, but we still need to know which fraction has enough quality for considering it in the computation of mutation rates

ADD REPLYlink written 12 weeks ago by elcortegano40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1486 users visited in the last hour