Question: Snp Filtering Based On Read-Depth In Gatk
0
gravatar for michealsmith
6.8 years ago by
michealsmith740
michealsmith740 wrote:

A long-time question about SNP calling and filtering using GATK is: Has GATK used read depth as a metrics for SNP filtering? Say, only SNPs covered with at least 10 reads will be preserved. There's a DP metrics in GATK, an example is as follows:

1    53139    53140    AA    -    1    53138    rs199543075    TAA    T    238.33    PASS    AC=1;AF=0.250;AN=4;BaseQRankSum=-1.954;DB;DP=30;FS=0.000;HaplotypeScore=0.5834;MLEAC=1;MLEAF=0.250;MQ=13.35;MQ0=0;MQRankSum=-0.312;QD=14.02;RPA=3,1;RU=A;ReadPosRankSum=1.093;STR;VQSLOD=2.04;culprit=QD;set=variant    GT:AD:DP:GQ:PL    0/0:5,0:5:15:0,15,255    0/1:2,6:8:78:277,0,78

From vcf header we know:

#FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

So this DP metrics represent the read depth? Here DP=30; while the total read depth for my two samples is 5+8=13, so why different?

Also, I always come across SNP callings with very low coverage, like 2 or 3 reads, in my filtered list of SNP, so I hardly believe GATK ever sets up an actual read-depth as metrics for filtering.

Thanks

gatk snp • 5.1k views
ADD COMMENTlink modified 6.8 years ago by Jorjial280 • written 6.8 years ago by michealsmith740

I think 5+8 are the number of reads that have been really used for the genotyping (QUAL> value, etc... )

ADD REPLYlink written 6.8 years ago by Pierre Lindenbaum123k
3
gravatar for Jorjial
6.8 years ago by
Jorjial280
Valencia, Spain
Jorjial280 wrote:

You can find the explanation in the GATK guide. We can read:

"While the sample-level (FORMAT) DP field describes the total depth of reads that passed the Unified Genotyper's internal quality control metrics (like MAPQ > 17, for example), the INFO field DP represents the unfiltered depth over all samples..." I think this description solves your first question.

About the second question, I think that GATK sets up a threshold of 2 reads that have passed the quality control to print the position as a covered position (GT different of "./."). If the position can be a variant, I think it will be printed with an alternative allele. Anyway, you can always ask in the GATK community. I hope this helps.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Jorjial280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2127 users visited in the last hour