Question

Snp Filtering Based On Read-Depth In Gatk

0

Entering edit mode

11.3 years ago

michealsmith ▴ 790

A long-time question about SNP calling and filtering using GATK is: Has GATK used read depth as a metrics for SNP filtering? Say, only SNPs covered with at least 10 reads will be preserved. There's a DP metrics in GATK, an example is as follows:

1    53139    53140    AA    -    1    53138    rs199543075    TAA    T    238.33    PASS    AC=1;AF=0.250;AN=4;BaseQRankSum=-1.954;DB;DP=30;FS=0.000;HaplotypeScore=0.5834;MLEAC=1;MLEAF=0.250;MQ=13.35;MQ0=0;MQRankSum=-0.312;QD=14.02;RPA=3,1;RU=A;ReadPosRankSum=1.093;STR;VQSLOD=2.04;culprit=QD;set=variant    GT:AD:DP:GQ:PL    0/0:5,0:5:15:0,15,255    0/1:2,6:8:78:277,0,78

From vcf header we know:

#FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

So this DP metrics represent the read depth? Here DP=30; while the total read depth for my two samples is 5+8=13, so why different?

Also, I always come across SNP callings with very low coverage, like 2 or 3 reads, in my filtered list of SNP, so I hardly believe GATK ever sets up an actual read-depth as metrics for filtering.

Thanks

gatk snp • 7.6k views

ADD COMMENT • link updated 11.3 years ago by Jorjial ▴ 300 • written 11.3 years ago by michealsmith ▴ 790

0

Entering edit mode

I think 5+8 are the number of reads that have been really used for the genotyping (QUAL> value, etc... )

ADD REPLY • link 11.3 years ago by Pierre Lindenbaum 161k

score 3 · Answer 1 · 2013-01-08

You can find the explanation in the GATK guide. We can read:

"While the sample-level (FORMAT) DP field describes the total depth of reads that passed the Unified Genotyper's internal quality control metrics (like MAPQ > 17, for example), the INFO field DP represents the unfiltered depth over all samples..." I think this description solves your first question.

About the second question, I think that GATK sets up a threshold of 2 reads that have passed the quality control to print the position as a covered position (GT different of "./."). If the position can be a variant, I think it will be printed with an alternative allele. Anyway, you can always ask in the GATK community. I hope this helps.