Calling Of Mutil-Allelic Snp Using Gatk
1
1
Entering edit mode
12.1 years ago
Bioscientist ★ 1.7k

I have four samples in a trio (and actually they are all patients) I tried using GATK-UnifiedGenotyper to call SNP/indel independently for each of them; as well as put them together and call SNP simultaneously.

When I check how the program deal with multi-allelic SNP, I found sth. interesting:

sample1:

2    92306130    rs111843696    C    G    151.67    PASS    AC=2;AF=1.00;AN=2;DB;DP=28;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=0.8321;MQ=10.38;MQ0=18;QD=5.42;SB=-73.08    GT:AD:DP:GQ:PL    1/1:0,7:28:20.87:152,21,0

sample2:

2    92306130    rs111843696    C    G    54.73    PASS    AC=2;AF=1.00;AN=2;DB;DP=18;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=1.9889;MQ=11.71;MQ0=8;QD=3.04;SB=-3.27    GT:AD:DP:GQ:PL    1/1:0,7:18:11.92:87,12,0

sample3:

2    92306130    rs111843696    C    G    54.73    PASS    AC=2;AF=1.00;AN=2;DB;DP=25;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=2.8210;MQ=9.83;MQ0=15;QD=2.19    GT:AD:DP:GQ:PL    1/1:0,8:25:11.92:87,12,0

sample4:

2    92306130    rs111843696    C    G    203.19    PASS    AC=2;AF=1.00;AN=2;BaseQRankSum=-0.347;DB;DP=34;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=5.4511;MQ=11.99;MQ0=14;MQRankSum=1.042;QD=5.98;ReadPosRankSum=0.347;SB=-71.78    GT:AD:DP:GQ:PL    1/1:1,16:34:26.85:203,27,0

So we can see at this locus, all 4 samples share the same C-G mutation. However, in the vcf of combined calling:

2    92306130    rs111843696    C    A,G    529.37    PASS    AC=2,6;AF=0.25,0.75;AN=8;BaseQRankSum=-1.116;DB;DP=105;Dels=0.00;FS=0.000;HaplotypeScore=1.3628;MQ=11.04;MQ0=55;MQRankSum=1.193;QD=5.04;ReadPosRankSum=0.423;SB=-137.50    GT:AD:DP:GQ:PL    1/2:0,9,7:28:24.43:176,131,125,45,0,242/2:0,3,7:18:11.92:87,87,87,12,12,0    2/2:0,8,8:25:11.92:87,87,87,12,12,0    1/2:1,10,16:34:47.87:251,176,164,75,0,48

Now it's multi-allelic calling. I guess this is because when called independently, the read-depth of ALT allele "A" is quite low; and then when combined, the read-depth may surpass certain threshold so that A is called? thx

gatk snp • 4.8k views
ADD COMMENT
0
Entering edit mode

The easiest way to check would be adjusting the threshold and see whether that's the case, or, a simple pileup will tell you how that position looks like with all reads on top.

ADD REPLY
0
Entering edit mode
12.0 years ago

let me first say that you have to take into deep consideration that when dealing with non-biallelic variants the odds are critical for the calling. GATK tries to address this issue by allowing multi-sample calling rather than calling each sample individually, because GATK knows that the information of some samples will help taking decisions on others. anyway, I guess you aren't using the -maxAlleles option on the single-sample calling and you are probably forcing biallelic calls, because you would have to see those As on the 4 examples from above. what I see is that you are forcing GATK to call variants with no or very limited presence of the reference allele, only allowing a single alternate allele, hence forcing GATK to report these variants as homozygous for the single alternate allele allowed (although as the multi-sample calling states, there are reads with other alternate allele moving around).

I'm pretty sure that if you re-analyze those single samples with -maxAlleles 2 (or higher) you will get the same results as the multi-sample run. although I don't see anything on the documentation stating it, I'm almost certain that when performing a multi-sample analysis the maxAlleles default value is not set to 1, as this would highly limit any multi-sample calling capabilities.

ADD COMMENT

Login before adding your answer.

Traffic: 2701 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6