Question: importance of known sites/resources in GATK pipeline
0
gravatar for Floydian_slip
5.7 years ago by
Floydian_slip130
United States
Floydian_slip130 wrote:

Hi,
I have a general question about GATK related to the importance of known VCFs (for BQSR and HC) and resources file (for VQSR). I am working on rice for which the only known sites are the dbSNP VCF files which are built on a genomic version older than the reference genomic fasta file which I am using as basis. 
 

How does it affect the quality/accuracy of variants? How important is to have the exact same build of the genome as the one on which the known VCF is based? Is it better to leave out the known sites for some of the steps than to use the version which is built on a different version of the genome for the same species? In other words, which steps (BQSR, HC, VQSR etc) can be performed without the known sites/resource file?

 
If the answers to the above questions are too detailed, can you please point me to any document, if available, which might address this issue?

Thanks,
Neil

ADD COMMENTlink modified 5.6 years ago by Ashutosh Pandey12k • written 5.7 years ago by Floydian_slip130
0
gravatar for Ashutosh Pandey
5.6 years ago by
Philadelphia
Ashutosh Pandey12k wrote:

How does it affect the quality/accuracy of variants?

You should read these posts to know how BQSR and VQSR work (http://gatkforums.broadinstitute.org/discussion/44/base-quality-score-recalibration-bqsrhttp://gatkforums.broadinstitute.org/discussion/39/variant-quality-score-recalibration-vqsr)

How important is to have the exact same build of the genome as the one on which the known VCF is based?

It is important to have variant data or dbSNP from the same build of the genome unless it was a minor revision in the assembly which didn't change the coordinates between the two builds. If coordinates of the same variant/gene differ between two genomic builds then you shouldn't use them. But you can liftover to get the new coordinates.

Is it better to leave out the known sites for some of the steps than to use the version which is built on a different version of the genome for the same species?

It is better to leave out these steps if you dont't have dbSNP data for the same build but if you really want to try then a) you can use liftover to get the new positions OR b) call variants without these steps and manually select strong variants (high MAPQ, decent number of reads etc.) and repeat BQSR/VQSR using these set of variants. 

In other words, which steps (BQSR, HC, VQSR etc) can be performed without the known sites/resource file

HC can be performed without the known variants.

PS: I have never seen any dramatic effect of performing BQSR on variant calling. BQSR is helpful but it doesn't aid much if you already have good  NGS data to start with.  

ADD COMMENTlink modified 5.6 years ago • written 5.6 years ago by Ashutosh Pandey12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1454 users visited in the last hour