Allele Depth (AD) / Allele Balance (AB) Filtering in GATK 4
1
2
Entering edit mode
4.1 years ago
maegsul ▴ 170

Hi,

I am trying to filter my GATK 4.0.3 - HaplotypeCaller generated multi-sample VCF for allele depth (AD) annotation at sample genotype-level (so available in "FORMAT" fields of each sample).

I think prior to GATK 4, this annotation was available as "Allele Balance" (AB) ratios (generated by AlleleBalanceBySample), but it is not available anymore in GATK 4. So I tried to filter genotypes based on AD field, that is exactly the same thing but indicated in "X,Y" format, so in an array format of integers. This array format makes it difficult to filter based on depth of alternative allele divided by depth of all alleles at a specific site.

What could be a solution to this? If I could turn this array into a ratio, I could easily filter genotypes using VariantFiltration or other tools such as vcflib/vcffilter. I also tried the below code (following https://gatkforums.broadinstitute.org/gatk/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk):

gatk VariantFiltration -R $ref -V$vcf -O \$output --genotype-filter-expression 'vc.getGenotype("Sample1").getAD().1 / vc.getGenotype("Sample1").getAD().0 > 0.33' --set-filtered-genotype-to-no-call --genotype-filter-name 'ABfilter'


This worked, but strangely it filters the variant for all samples if only one of the sample have allele depths that are not in balance (defined by the filter). If it worked only for Sample1, I was planning to write a quick loop for all the samples for instance. I tried the same with GATK 3.8, but still it filters whole variant for all the samples if it is filtered in just one sample.

Any suggestions?

gatk filtering sequencing allele depth • 6.7k views
4
Entering edit mode
4.1 years ago

using vcffilterjdk: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().filter(G->G.hasAD()).map(G->G.getAD()).allMatch(A->A.length>1 && A[0]>0 && A[1]/(double)A[0]> 0.33);'  in.vcf

0
Entering edit mode

Thank you very much Pierre! I didn't know about vcffilterjdk - it looks very useful.

However, this command above is not exactly what I wanted. It actually filtered out all of the variants (i.e. at least in one of my samples allelic imbalance is present). What I want is to assign ./. missing genotypes for sample-level genotypes in a VCF, if they fail to pass defined AD ratio (so, keeping the variant itself still if it is present in at least one sample with desired AD ratio thresholds). Is there a way to filter directly genotypes, but not variants?

1
Entering edit mode

if I understand well , try the following script in the '-e' option (or use a file with option '-f' )

return new VariantContextBuilder(variant).
genotypes(
variant.
getGenotypes().
stream().
map( G->{
if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
}
return  GenotypeBuilder.createMissing(G.getSampleName(),2);
}).
collect(Collectors.toList())
).
make();


note: this will replace some genotypes with NO_CALL genotype: you should recalculate INFO data like DP/AF/AC/ etc...

0
Entering edit mode

This worked very well, thank you very much!

Only thing is that, I have to modify this script only for heterozygous calls, otherwise my homozygous ref calls are also turned into NO_CALL genotype. Can I add something like .isHet(); for that purpose?

1
Entering edit mode

Can I add something like .isHet()

0
Entering edit mode

0
Entering edit mode

Yes, I marked it as accepted, thank you very much again!

I couldn't figure out where to add .isHet() in the working script though, I am having some errors with the script...

1
Entering edit mode

I don't know where is the logic of your HET, it could look like

 if(G.hasAD() && G.isHet()) { ...)


or

if(G.isHet()) return G;


or

if(!G.isHet()) return G;

0
Entering edit mode

The logic is fine with the first two options you indicated, but I couldn't manage to make it work. Still all the genotypes (regardless of being a heterozygous or homozygous ref call) are filtered based on the custom AD filter. My goal is to assign NO_CALL genotypes only for heterozygous genotypes having AD values out of the indicated ratio.

I tried:

return new VariantContextBuilder(variant).
genotypes(
variant.
getGenotypes()
stream().
map( G->{
if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
}
return  GenotypeBuilder.createMissing(G.getSampleName(),2);
}).
collect(Collectors.toList())
).
make();


and

return new VariantContextBuilder(variant).
genotypes(
variant.
getGenotypes()
stream().
map( G->{
if(G.isHet()) return G;
if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
}
return  GenotypeBuilder.createMissing(G.getSampleName(),2);
}).
collect(Collectors.toList())
).
make();

2
Entering edit mode

My goal is to assign NO_CALL genotypes only for heterozygous genotypes having AD values out of the indicated ratio.

 if(G.hasAD()) {
if(!G.isHet()) return G;

0
Entering edit mode

Yes, now I understood my mistake... Now it works very well!

Thank you very much once more Pierre!

0
Entering edit mode

Greetings,

This has been very helpful as I was not aware of vcffilterjdk until this post. I too am trying to filter by AD. However, I for some reason cannot figure out how to prevent it from changing homozygous Ref calls to ./. Can you please advise what I am doing wrong? I have added the snippet of code which should skip over non-het calls but it still changes everything.

Here is how I am calling it:

java -jar /home/nathanael_herrera/Software/jvarkit/dist/vcffilterjdk.jar -o output.vcf -e 'return new VariantContextBuilder(variant).
genotypes(
variant.
getGenotypes().
stream().
map( G->{
if(!G.isHet()) return G;
if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
}
return  GenotypeBuilder.createMissing(G.getSampleName(),2);
}).
collect(Collectors.toList())
).
make();' input.vcf


0
Entering edit mode

Hi,

I think I have come across with the same problem but with the below code is working well for me, maybe give it a try:

return new VariantContextBuilder(variant).
genotypes(
variant.
getGenotypes().
stream().
map( G->{
if( A.length>1 &&  A[0]>0 && A[1]/(double)A[0] > 0.332 && A[1]/(double)A[0] < 3.01 ) return G;
}
if( A[0] == 0 ) return G;
if( A.length>1 &&  A[0]>0 && A[1]/(double)A[0] > 0.8999 ) return G;
}
if( A[1] == 0 ) return G;
if( A.length>1 &&  A[1]>0 && A[1]/(double)A[0] < 0.11 ) return G;
}
return  GenotypeBuilder.createMissing(G.getSampleName(),2);
}).
collect(Collectors.toList())
).
make();
`

Here I also filter ref homozygous and alt homozygous calls based on 1:9 or 9:1 ratios though, in your case you can delete these.

0
Entering edit mode

Excellent, works very well! Thank you very much for the response! Cheers