Hello,
I am trying to obtain SNP/SNVs using GATK pipeline
. I found this post from biostars particularly useful, which shows almost all the steps involved. However, I see that this is a bit outdated as new parameters and versions have been released. For example, BAQ
is now implemented in GATK
and also the inclusion of base quality recalibration. And these are the two things I would like to get clarified.
1) Where and when does one normally apply the inbuilt BAQ
option? In the GATK manual for BAQ here, they mention many places where BAQ option can be enabled.
2) While recalibrating base quality scores
, there is an option for dbSNP (with the -D parameter I believe?). I am working on Arabidopsis thaliana and although there are quite a lot of snps across different accessions available, there isn't a lot in the NCBI dbSNP database. So, my questions are:
2a)
How important is it to include this parameter to perform quality score recalibration?
2b)
If its very important, how can I obtain SNPs for Arabidopsis (or any other species other than humans) in this format?
Thank you very much! Arun.
Your best bet for answers on GATK is here: https://getsatisfaction.com/gsa/topics