Why Is "-Knownsites" Option Required In Gkno'S "Gatk-Count-Covariates" Tool?
2
2
Entering edit mode
10.9 years ago
Carlos Borroto ★ 2.1k

Hi,

I recently found about gkno and I'm getting familiar with it. I have limited experience using GATK and I was wondering why is gkno marking "-knownSites" option required in "gatk-count-covariates" tool. As far as I can tell while strongly advised this option is marked as optional in upstream[[1]]. I'm working with bacterial genomes with no known SNPs database to use. I guess I could skip quality recalibration all together, but I feel this would be far from ideal.

http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_bqsr_BaseRecalibrator.html

Thanks,
Carlos

gatk • 4.3k views
ADD COMMENT
3
Entering edit mode
10.9 years ago
alistairnward ▴ 210

Unfortunately, the '-knownSites' option is required, not optional. As previously mentioned, when GATK marches through the BAM file, it assumes that any mismatch with the reference is an error. If the mismatch is a known variant (i.e. it is in dbSNP), GATK ignores the site and doesn't use it in generating covariates. Removing the recalibration step is definitely an option, alternatively, you could generate a vcf file that contains a single SNP, just to fulfill requirements. If you need assistance, modifying a pipeline, please let us know. Unfortunately, we are not the authors of GATK and as such, we cannot modify the requirements for that tool.

ADD COMMENT
0
Entering edit mode

Thanks, for your answer. I was thinking on the workaround of the empty(from what you say I need at least one entry) vcf file.

One thing does confuse me and it could be because of my limited experience with GATK. From the link I include above, in table "BaseRecalibrator specific arguments", '-knownSites' is marked as optional. However you say is required, are you saying GATK documentation is mislabeling this option?

ADD REPLY
0
Entering edit mode

I kept thinking about my options. What if I do a first past without recalibration and generate a vcf file with a very stringent set of variants. I then use this set as my "knownSites". Would I be introducing bias? If I understood covariates correctly, as long as there is a good representation of the sample I should be fine, right? Do you have recommendations on stringent variant qualifiers to build the initial set of variants? Thanks.

ADD REPLY
0
Entering edit mode

This is actually what we recommend doing in case you do not have known sites available for your organism. You can repeat this "loop" (generate high confidence variants, use them to recalibrate the original bam file, generate a new set of high confidence variants) several times to refine the set of high confidence variants, for best results.

ADD REPLY
0
Entering edit mode

Great to get confirmation for this approach. Is there a good link where I could read how you recommend doing this?

ADD REPLY
0
Entering edit mode

Hi Carlos,

This is an issue of interpretation of the documentation. What we indicate as required in the documentation is what is required for the program to run from a technical standpoint. It is technically possible to run BaseRecalibrator without known sites. However, it is extremely inadvisable to do so from an analytical standpoint, because of the assumptions that the algorithm relies on. We try to make this clear in the documentation that describes how to use the tools.

ADD REPLY
0
Entering edit mode

Thanks, for the help, it is quite useful.

ADD REPLY
0
Entering edit mode
10.9 years ago

from http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_bqsr_BaseRecalibrator.html

It does a by-locus traversal operating only at sites that are not in dbSNP. We assume that all reference mismatches we see are therefore errors and indicative of poor base quality.

ADD COMMENT
0
Entering edit mode

Sorry but this doesn't answer my question. Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2813 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6