Question: Allele Depth (AD) / Allele Balance (AB) Filtering in GATK 4
0
gravatar for maegsul
13 months ago by
maegsul150
Belgium
maegsul150 wrote:

Hi,

I am trying to filter my GATK 4.0.3 - HaplotypeCaller generated multi-sample VCF for allele depth (AD) annotation at sample genotype-level (so available in "FORMAT" fields of each sample).

I think prior to GATK 4, this annotation was available as "Allele Balance" (AB) ratios (generated by AlleleBalanceBySample), but it is not available anymore in GATK 4. So I tried to filter genotypes based on AD field, that is exactly the same thing but indicated in "X,Y" format, so in an array format of integers. This array format makes it difficult to filter based on depth of alternative allele divided by depth of all alleles at a specific site.

What could be a solution to this? If I could turn this array into a ratio, I could easily filter genotypes using VariantFiltration or other tools such as vcflib/vcffilter. I also tried the below code (following https://gatkforums.broadinstitute.org/gatk/discussion/1255/what-are-jexl-expressions-and-how-can-i-use-them-with-the-gatk):

gatk VariantFiltration -R $ref -V $vcf -O $output --genotype-filter-expression 'vc.getGenotype("Sample1").getAD().1 / vc.getGenotype("Sample1").getAD().0 > 0.33' --set-filtered-genotype-to-no-call --genotype-filter-name 'ABfilter'

This worked, but strangely it filters the variant for all samples if only one of the sample have allele depths that are not in balance (defined by the filter). If it worked only for Sample1, I was planning to write a quick loop for all the samples for instance. I tried the same with GATK 3.8, but still it filters whole variant for all the samples if it is filtered in just one sample.

Any suggestions?

PS: I've asked the same question here also in GATK forum: https://gatkforums.broadinstitute.org/gatk/discussion/12140/allele-depth-ad-allele-balance-ab-filtering-in-gatk-4

ADD COMMENTlink modified 13 months ago by Pierre Lindenbaum121k • written 13 months ago by maegsul150
3
gravatar for Pierre Lindenbaum
13 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

using vcffilterjdk: http://lindenb.github.io/jvarkit/VcfFilterJdk.html

the following expression keep the variants if all the genotypes with AD have AD[1]/AD[0]>0.33 .

java -jar dist/vcffilterjdk.jar -e 'return variant.getGenotypes().stream().filter(G->G.hasAD()).map(G->G.getAD()).allMatch(A->A.length>1 && A[0]>0 && A[1]/(double)A[0]> 0.33);'  in.vcf
ADD COMMENTlink written 13 months ago by Pierre Lindenbaum121k

Thank you very much Pierre! I didn't know about vcffilterjdk - it looks very useful.

However, this command above is not exactly what I wanted. It actually filtered out all of the variants (i.e. at least in one of my samples allelic imbalance is present). What I want is to assign ./. missing genotypes for sample-level genotypes in a VCF, if they fail to pass defined AD ratio (so, keeping the variant itself still if it is present in at least one sample with desired AD ratio thresholds). Is there a way to filter directly genotypes, but not variants?

ADD REPLYlink written 13 months ago by maegsul150
1

if I understand well , try the following script in the '-e' option (or use a file with option '-f' )

return new VariantContextBuilder(variant).
    genotypes(
        variant.
        getGenotypes().
        stream().
        map( G->{
            if(G.hasAD()) {
                final int A[]= G.getAD();
                if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
                }
            return  GenotypeBuilder.createMissing(G.getSampleName(),2);
            }).
        collect(Collectors.toList())
        ).
    make();

note: this will replace some genotypes with NO_CALL genotype: you should recalculate INFO data like DP/AF/AC/ etc...

ADD REPLYlink modified 13 months ago • written 13 months ago by Pierre Lindenbaum121k

This worked very well, thank you very much!

Only thing is that, I have to modify this script only for heterozygous calls, otherwise my homozygous ref calls are also turned into NO_CALL genotype. Can I add something like .isHet(); for that purpose?

ADD REPLYlink written 13 months ago by maegsul150
1

Can I add something like .isHet()

definitely : https://samtools.github.io/htsjdk/javadoc/htsjdk/htsjdk/variant/variantcontext/Genotype.html#isHet--

ADD REPLYlink modified 13 months ago • written 13 months ago by Pierre Lindenbaum121k

If an answer was helpful you should upvote it, if the answer resolved your question you should mark it as accepted.

Upvote|Bookmark|Accept

ADD REPLYlink written 13 months ago by Pierre Lindenbaum121k

Yes, I marked it as accepted, thank you very much again!

I couldn't figure out where to add .isHet() in the working script though, I am having some errors with the script...

ADD REPLYlink written 13 months ago by maegsul150
1

I don't know where is the logic of your HET, it could look like

 if(G.hasAD() && G.isHet()) { ...)

or

if(G.isHet()) return G;
if(G.hasAD()) { ...

or

if(!G.isHet()) return G;
if(G.hasAD()) { ...
ADD REPLYlink written 13 months ago by Pierre Lindenbaum121k

The logic is fine with the first two options you indicated, but I couldn't manage to make it work. Still all the genotypes (regardless of being a heterozygous or homozygous ref call) are filtered based on the custom AD filter. My goal is to assign NO_CALL genotypes only for heterozygous genotypes having AD values out of the indicated ratio.

I tried:

return new VariantContextBuilder(variant).
    genotypes(
        variant.
        getGenotypes()
        stream().
        map( G->{
         if(G.hasAD() && G.isHet()) {
                final int A[]= G.getAD();
                if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
                }
            return  GenotypeBuilder.createMissing(G.getSampleName(),2);
            }).
        collect(Collectors.toList())
        ).
    make();

and

return new VariantContextBuilder(variant).
    genotypes(
        variant.
        getGenotypes()
        stream().
        map( G->{
         if(G.isHet()) return G;
         if(G.hasAD()) {
                final int A[]= G.getAD();
                if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
                }
            return  GenotypeBuilder.createMissing(G.getSampleName(),2);
            }).
        collect(Collectors.toList())
        ).
    make();
ADD REPLYlink modified 13 months ago • written 13 months ago by maegsul150
2

My goal is to assign NO_CALL genotypes only for heterozygous genotypes having AD values out of the indicated ratio.

 if(G.hasAD()) {
    if(!G.isHet()) return G;
ADD REPLYlink written 13 months ago by Pierre Lindenbaum121k

Yes, now I understood my mistake... Now it works very well!

Thank you very much once more Pierre!

ADD REPLYlink written 13 months ago by maegsul150

Greetings,

This has been very helpful as I was not aware of vcffilterjdk until this post. I too am trying to filter by AD. However, I for some reason cannot figure out how to prevent it from changing homozygous Ref calls to ./. Can you please advise what I am doing wrong? I have added the snippet of code which should skip over non-het calls but it still changes everything.

Here is how I am calling it:

`java -jar /home/nathanael_herrera/Software/jvarkit/dist/vcffilterjdk.jar -o output.vcf -e 'return new VariantContextBuilder(variant).
genotypes(
    variant.
    getGenotypes().
    stream().
    map( G->{
        if(G.hasAD()) {
            if(!G.isHet()) return G;
            final int A[]= G.getAD();
            if(A.length>1 &&  A[0]>0 && A[1]/(double)A[0]> 0.33) return G;
            }
        return  GenotypeBuilder.createMissing(G.getSampleName(),2);
        }).
    collect(Collectors.toList())
    ).
make();' input.vcf

Thank you in advance for any advice!

ADD REPLYlink modified 11 months ago • written 11 months ago by Nathanael H0

Hi,

I think I have come across with the same problem but with the below code is working well for me, maybe give it a try:

return new VariantContextBuilder(variant).
genotypes(
    variant.
    getGenotypes().
    stream().
    map( G->{
     if(G.hasAD() && G.isHet()) {
            final int A[]= G.getAD();
            if( A.length>1 &&  A[0]>0 && A[1]/(double)A[0] > 0.332 && A[1]/(double)A[0] < 3.01 ) return G;
         }
        if(G.hasAD() && G.isHom()) {
            final int A[]= G.getAD();
         if( A[0] == 0 ) return G;
            if( A.length>1 &&  A[0]>0 && A[1]/(double)A[0] > 0.8999 ) return G;
            }
        if(G.hasAD() && G.isHomRef()) {
            final int A[]= G.getAD();
         if( A[1] == 0 ) return G;
            if( A.length>1 &&  A[1]>0 && A[1]/(double)A[0] < 0.11 ) return G;
            }
        return  GenotypeBuilder.createMissing(G.getSampleName(),2);
        }).
 collect(Collectors.toList())
    ).
make();

Here I also filter ref homozygous and alt homozygous calls based on 1:9 or 9:1 ratios though, in your case you can delete these.

ADD REPLYlink written 11 months ago by maegsul150

Excellent, works very well! Thank you very much for the response! Cheers

ADD REPLYlink written 11 months ago by Nathanael H0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1371 users visited in the last hour