Question: Comparison Of Two Versions Of Gatk Filtering For Exome Sequencing Data.
1
gravatar for michealsmith
7.4 years ago by
michealsmith740
michealsmith740 wrote:

I recently read papers involving identifying disease-causing mutations using GATK for exome sequencing data. Most paper's pipeline is sth. like:

Call SNP/indel using GATK UnifiedGenotyper. All calls with a read coverage ≤2× and a Phred-scaled SNP quality of ≤20 were filtered out.

I'm then curious which step of GATK filtering is for RD<2 and qualify<20? (Simply too many steps for GATK!!!)

I compare both GATK-2 and GATK-3.

GATK2:

For exomes with deep coverage per sample
DATA_TYPE_SPECIFIC_FILTERS should be "QUAL < 30.0 || QD < 5.0 || HRun > 5 || SB > -0.10"

GATK3:

For SNPs
DATA_TYPE_SPECIFIC_FILTERS should be "QD < 2.0", "MQ < 40.0", "FS > 60.0", "HaplotypeScore > 13.0", "MQRankSum < -12.5", "ReadPosRankSum < -8.0". 

For Indels
DATA_TYPE_SPECIFIC_FILTERS should be "QD < 2.0", "ReadPosRankSum < -20.0", "InbreedingCoeff < -0.8", "FS > 200.0".

I would say most published paper should have used GATK2; so for users who read paper using GATK2, and use GATK3 for our own research may get confused, so let me clarify:

MQ < 40.0 in GATK3 is equivalent to QUAL < 30.0 in GATK2, and this is mapping quality filter, right?

QD < 2.0 in GATK3 is equivalent to QD<5.0 in GATK2, and this is Read-depth filter, right?

FS > 60.0 in GATK3 is equivalent to SB > -0.10 in GATK2, and this is strand bias filter, right?

thx

gatk • 5.9k views
ADD COMMENTlink written 7.4 years ago by michealsmith740
2
gravatar for Sean Davis
7.4 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:

I believe this filtering is being done on the VCF files. Variants with DP<=2 or QUAL<=20 were removed. I'm not sure what the GATK2 and GATK3 have to do with what you describe from the paper.

As for your questions, QUAL and MQ are two different concepts. QUAL has to do with the quality of the variant while MQ is concerned with the mapping quality. QD is "quality by depth". The definition does not change, I do not think. As for FS and SB, I do not have a good answer as to whether these are equivalent or not.

ADD COMMENTlink written 7.4 years ago by Sean Davis25k
1
gravatar for rbagnall
7.4 years ago by
rbagnall1.4k
Australia
rbagnall1.4k wrote:

FS (Fisher strand) and SB (strand bias) are almost the same, I think.

SB is simply when the variation is only found on the forward or only the reverse strand.

FS is "Phred-scaled p-value using Fisher's Exact Test to detect strand bias in the reads. More bias is indicative of false positive calls." (From the GATK website)

ADD COMMENTlink written 7.4 years ago by rbagnall1.4k
0
gravatar for michealsmith
7.3 years ago by
michealsmith740
michealsmith740 wrote:

thanks guys. GATK is really complicated and confusing for newbies like me.

I think this is important question for us. We can not blindly use GATK without knowing the what those parameters stand for. Hope someone else can give clear answers.

ADD COMMENTlink written 7.3 years ago by michealsmith740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 835 users visited in the last hour