Question: Gatk Ouput : Duplication On Same Position
gravatar for khikho
5.8 years ago by
khikho100 wrote:

Is there any explanition for having these two lines on the same position? and Is there any way to pick almost the best one in this case automaticly?

21      26039812        .       G       .       90.19   .       GT:DP:GQ:PL:A:C:G:T:IR  0/0:20:60.20:0,60,807:0,0:0,0:14,6:0,0:8
21      26039812        .       GATAT   G       384.15  .      GT:DP:GQ:PL:A:C:G:T:IR  1/1:20:27.09:426,27,0:0,0:0,0:14,6:0,0:8

Thank you in advance.

vcf gatk • 1.7k views
ADD COMMENTlink modified 5.8 years ago by vdauwera920 • written 5.8 years ago by khikho100

well, as far as I can see the first line is 0/0 and there is no ALT, so it's not a variation...

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum121k

Pierre is correct. The first one is not a variation but in case it would have been a variation, then you should go with one with the highest score (i.e. 384.15 or second in this case). Many times you will find that variant caller has called a SNP and a short indel at the same position, the variant quality score can be used to select one of them.

ADD REPLYlink written 5.8 years ago by Ashutosh Pandey11k
gravatar for Erik Garrison
5.8 years ago by
Erik Garrison2.1k
Somerville, MA
Erik Garrison2.1k wrote:

You can filter out variants not found in any samples in your data set this way using vcffixup or vcffilter:

[vcf stream] | vcffixup - | vcffilter -f "AC > 0"

However, there is a deeper problem with the example you posted. It represents an impossible picture of the variation at the locus. Is the sample homozygous reference or does it have a homozygous deletion at the locus? I suggest you figure out what is meant by the overlapping reference call before simply picking the best one.

This ambiguity presents basic problems for interpretation. If removing such ambiguity from your calls is important to your research, then I suggest you try out a haplotype-based method like freebayes or platypus. A number of de novo assembly methods will also correctly provide this information.

ADD COMMENTlink written 5.8 years ago by Erik Garrison2.1k
gravatar for vdauwera
5.8 years ago by
Cambridge, MA
vdauwera920 wrote:

This looks like it was generated using the GATK's UnifiedGenotyper "emit all sites" mode. The first record is the ref call indicating there is no SNP at that site. The second record is an indel call. They are different calls, hence different records. If you don't want this to happen, don't use "emit all sites". Or use the GATK's newer caller, called HaplotypeCaller, which is haplotype-based as the name implies.

ADD COMMENTlink written 5.8 years ago by vdauwera920
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 953 users visited in the last hour