Question: Calling Of Mutil-Allelic Snp Using Gatk
gravatar for Bioscientist
8.9 years ago by
Bioscientist1.7k wrote:

I have four samples in a trio (and actually they are all patients) I tried using GATK-UnifiedGenotyper to call SNP/indel independently for each of them; as well as put them together and call SNP simultaneously.

When I check how the program deal with multi-allelic SNP, I found sth. interesting:


2    92306130    rs111843696    C    G    151.67    PASS    AC=2;AF=1.00;AN=2;DB;DP=28;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.8321;MQ=10.38;MQ0=18;QD=5.42;SB=-73.08    GT:AD:DP:GQ:PL    1/1:0,7:28:20.87:152,21,0


2    92306130    rs111843696    C    G    54.73    PASS    AC=2;AF=1.00;AN=2;DB;DP=18;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=1.9889;MQ=11.71;MQ0=8;QD=3.04;SB=-3.27    GT:AD:DP:GQ:PL    1/1:0,7:18:11.92:87,12,0


2    92306130    rs111843696    C    G    54.73    PASS    AC=2;AF=1.00;AN=2;DB;DP=25;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=2.8210;MQ=9.83;MQ0=15;QD=2.19    GT:AD:DP:GQ:PL    1/1:0,8:25:11.92:87,12,0


2    92306130    rs111843696    C    G    203.19    PASS    AC=2;AF=1.00;AN=2;BaseQRankSum=-0.347;DB;DP=34;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=5.4511;MQ=11.99;MQ0=14;MQRankSum=1.042;QD=5.98;ReadPosRankSum=0.347;SB=-71.78    GT:AD:DP:GQ:PL    1/1:1,16:34:26.85:203,27,0

So we can see at this locus, all 4 samples share the same C-G mutation. However, in the vcf of combined calling:

2    92306130    rs111843696    C    A,G    529.37    PASS    AC=2,6;AF=0.25,0.75;AN=8;BaseQRankSum=-1.116;DB;DP=105;Dels=0.00;FS=0.000;HaplotypeScore=1.3628;MQ=11.04;MQ0=55;MQRankSum=1.193;QD=5.04;ReadPosRankSum=0.423;SB=-137.50    GT:AD:DP:GQ:PL    1/2:0,9,7:28:24.43:176,131,125,45,0,242/2:0,3,7:18:11.92:87,87,87,12,12,0    2/2:0,8,8:25:11.92:87,87,87,12,12,0    1/2:1,10,16:34:47.87:251,176,164,75,0,48

Now it's multi-allelic calling. I guess this is because when called independently, the read-depth of ALT allele "A" is quite low; and then when combined, the read-depth may surpass certain threshold so that A is called? thx

gatk snp • 4.2k views
ADD COMMENTlink written 8.9 years ago by Bioscientist1.7k

The easiest way to check would be adjusting the threshold and see whether that's the case, or, a simple pileup will tell you how that position looks like with all reads on top.

ADD REPLYlink written 8.9 years ago by Vitis2.4k
gravatar for Jorge Amigo
8.9 years ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

let me first say that you have to take into deep consideration that when dealing with non-biallelic variants the odds are critical for the calling. GATK tries to address this issue by allowing multi-sample calling rather than calling each sample individually, because GATK knows that the information of some samples will help taking decisions on others. anyway, I guess you aren't using the -maxAlleles option on the single-sample calling and you are probably forcing biallelic calls, because you would have to see those As on the 4 examples from above. what I see is that you are forcing GATK to call variants with no or very limited presence of the reference allele, only allowing a single alternate allele, hence forcing GATK to report these variants as homozygous for the single alternate allele allowed (although as the multi-sample calling states, there are reads with other alternate allele moving around).

I'm pretty sure that if you re-analyze those single samples with -maxAlleles 2 (or higher) you will get the same results as the multi-sample run. although I don't see anything on the documentation stating it, I'm almost certain that when performing a multi-sample analysis the maxAlleles default value is not set to 1, as this would highly limit any multi-sample calling capabilities.

ADD COMMENTlink written 8.9 years ago by Jorge Amigo12k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2353 users visited in the last hour