Gatk Baq And Dbsnp Option In Countvariates
1
0
Entering edit mode
12.0 years ago
Arun 2.4k

Hello,

I am trying to obtain SNP/SNVs using GATK pipeline. I found this post from biostars particularly useful, which shows almost all the steps involved. However, I see that this is a bit outdated as new parameters and versions have been released. For example, BAQ is now implemented in GATK and also the inclusion of base quality recalibration. And these are the two things I would like to get clarified.

1) Where and when does one normally apply the inbuilt BAQ option? In the GATK manual for BAQ here, they mention many places where BAQ option can be enabled.
2) While recalibrating base quality scores, there is an option for dbSNP (with the -D parameter I believe?). I am working on Arabidopsis thaliana and although there are quite a lot of snps across different accessions available, there isn't a lot in the NCBI dbSNP database. So, my questions are:
2a) How important is it to include this parameter to perform quality score recalibration?
2b) If its very important, how can I obtain SNPs for Arabidopsis (or any other species other than humans) in this format?

Thank you very much! Arun.

gatk • 3.2k views
ADD COMMENT
2
Entering edit mode

Your best bet for answers on GATK is here: https://getsatisfaction.com/gsa/topics

ADD REPLY
0
Entering edit mode
12.0 years ago
Arun 2.4k

I found the reason for this happening. Unfortunately, even though the question was asked in getsatisfaction gsa website, they did not answer as its not a software problem.

The problem is that GATK requires actual mapping quality values for recalibration (which makes sense). However, tophat and bowtie don't provide these values in your SAM/BAM file. The qualities are instead 255. So, GATK skips all the reads; hence the error. It also seems that in earlier versions of GATK they allowed this 255 value. But recent versions require the use of a parameter -rf ReassignMappingQuality -DMQ 60 to reassign mapping quality to 60, for example (I did not test this yet). However, just by using an awk script and changing the mapping quality to 60 (since all my reads are already unique) worked. So, you can do it either ways. Just in case if anyone else runs into this trouble.

I still don't understand why the software doesn't provide an error on the console that it skipped all reads due to mapping quality of 255, which is a good programming practice. Instead they take all the pain to say "BAM file not correct" and don't ask this question on getsatisfaction.com/gsa !!

ADD COMMENT

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6