Question: Setting positions in samples as missing using bcftools
0
gravatar for ucbtsm8
6 months ago by
ucbtsm80
ucbtsm80 wrote:

I'm working with imputed data in multisample bcf format which contains a FORMAT/GP field for each sample.

I would like to set to missing (./.) a genotype in a sample where the max(FORMAT/GP) is less than a given threshold (say 0.99). For instance, 0.1,0.8,0.1 would be set as missing, whereas 1,0,0 wouldn't.

I would like to do this in bcftools, using the +setGT plugin. I've tried doing this using several commands such as:

bcftools +setGT input.bcf -- -t q -n . -i'max(FORMAT/GP)>0.99'

However, this seems to work by checking if any sample at a position has a max value of less than 0.99, and then sets all genotypes at that position as missing. To be clear, want this to run on a per-sample basis only set GTs as missing in individuals that have a max value of less than 0.99, rather than all of them.

Can anyone help me with the command I need?

bcftools software error • 229 views
ADD COMMENTlink modified 6 months ago by cedcoul0 • written 6 months ago by ucbtsm80
0
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

using vcffilterjdk. http://lindenb.github.io/jvarkit/VcfFilterJdk.html

and the following code (hard to test without seeing the vcf)

final String SAMPLE="B00GWGD";
final String TAG="GP";

List<Genotype> genotypes = new ArrayList<>();
for(Genotype gt: variant.getGenotypes()) {
    if(!gt.getSampleName().equals(SAMPLE) ||
        gt.isNoCall() || 
        !gt.hasAnyAttribute(TAG) ||
        ((List<?>)gt.getAnyAttribute(TAG)).stream().
            map(O->new Double(O.toString())).
            anyMatch(V->V.doubleValue()>=0.99)) {
        genotypes.add(gt);
        }
    else
        {
        genotypes.add(GenotypeBuilder.createMissing(gt.getSampleName(),gt.getPloidy()));
        }
    }

return new VariantContextBuilder(variant).genotypes(genotypes).make();

usage:

java -jar dist/vcffilterjdk.jar --recalc -f biostar.code the.vcf
ADD COMMENTlink written 6 months ago by Pierre Lindenbaum129k
0
gravatar for cedcoul
6 months ago by
cedcoul0
France/Paris/CNAM
cedcoul0 wrote:

The issue comes from the setGT plugin. I think it does not understand 'max' in expression.

But you can use another plugin called 'tag2tag', in your case (Max(GP)>0.99):

bcftools +tag2tag input.bcf -- -t 0.01 --gp-to-gt
ADD COMMENTlink modified 6 months ago • written 6 months ago by cedcoul0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1326 users visited in the last hour