Picard FilterVcf Javascript Usage
1
0
Entering edit mode
7.0 years ago
hellbio ▴ 520

I have a VCF file generated from ~100 whole-genome samples. My aim is to identify variants segregating under a recessive model where I have three affected samples and one healthy carrier and the remaining 96 samples are clear controls i.e homRef. I have used the below javascript file to show variants that are homozygous for the alternate allele in the 3 affected samples and het in the carrier and wildtype in the 96 samples:

function accept(ctx) {
for(var i=0;i< ctx.getNSamples();++i)
{
var g = ctx.getGenotype(i);
var sample = g.getSampleName();
if ( (sample=="BA1" && sample=="BA2" && sample=="BA3") && (g.isHomVar())) continue;
if ( (sample=="BC4") && (g.isHet())) continue;
if ( g.isHomRef() || g.isNoCall()) continue;
return false;
}
return true;
}

accept(variant);

BA1,BA2 and BA3 are affected samples and BC4 is carrier.

Command used:

picard FilterVcf I=WGS_multianno.vcf MIN_DP=0 MIN_GQ=0 JS=JavaScript O=BC.vcf

The output has the below tags in the FILTER column:

FILTER
AllGtsFiltered;JavaScript
JavaScript
PASS

However, inspecting the variants tagged under “AllGtsFiltered;JavaScript”, “JavaScript” or “PASS” does not seem to segregate with the inheritance model coded in the Javscript. Could you please help if the above script is going wrong somewhere?

Also, the above picard command took 2.2hrs to generate the output. Is there a better way to speed-up the processing?

picard filtervcf javascript • 2.6k views
ADD COMMENT
1
Entering edit mode
7.0 years ago
function accept(ctx) {
    for(var i=0;i< ctx.getNSamples();++i)
        {
        var g = ctx.getGenotype(i);
        var sample = g.getSampleName();
        // if sample is BA1 OR sample is BA2 OR sample is BA3
        if ( sample=="BA1" || sample=="BA2" || sample=="BA3")
            {
            // if genotype not homvar : REJECT
            if(!g.isHomVar()) return false;
            }
        // if sample is BC4
        else if ( sample=="BC4")
            {
            // if BC4 is NOT Het return false;
            if(!g.isHet()) return false;
            }
        else 
            {
            // for any other sample , check it is HomRef or NoCall
            if(!(g.isHomRef() || g.isNoCall()))  return false;
            }
        }
    return true;
    }

accept(variant);

if ( (sample=="BA1" && sample=="BA2" && sample=="BA3") && (g.isHomVar())) continue

have a look at the boolean operator https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Logical_Operators

a sample cannot be named BA1 AND BA2 AND BA3 at the same time.

. Is there a better way to speed-up the processing?

  • split the VCF in part and merge the chunks
  • write your own program
ADD COMMENT
0
Entering edit mode
The below command was used with the above javascript:

    `picard FilterVcf I=WGS_100.multianno.vcf MIN_DP=0 MIN_GQ=0 JS=RecessiveJS O=BC.vcf`

The output has the below two tags in the FILTER field:

 less BC.vcf | grep -v "##" | cut -f7 | sort | uniq
 AllGtsFiltered;RecessiveJS
 FILTER
 RecessiveJS

And variants with each of these tags do not seem to segregate as per the javascript file where the 3 samples should be homVar.

$ less BC.vcf | grep -v "##" | awk -F "\t" '$7=="AllGtsFiltered;RecessiveJS"' | head |cut -f30,31,34
 ./.:10,0:10:LowGQ:.:.:.:0,0,0  0/0:24,0:24:PASS:72:.:.:0,72,801    ./.:23,0:23:LowGQ:.:.:.:0,0,0
 0/0:10,0:10:PASS:30:.:.:0,30,338   1/1:0,0:.:LowDP:9:1|1:9356262_T_TAAAACTCTACTA:135,9,0         
 0/0:12,0:12:PASS:17:.:.:0,17,251   0/0:32,0:32:PASS:10:.:.:0,10,729    0/0:48,0:48:PASS:99:.:.:0,108,1416

 $ less BC.vcf | grep -v "##" | awk -F "\t" '$7=="RecessiveJS"' | head |cut -f30,31,34
 ./.:0,0:0:LowGQ:.:0,0,0    ./.:0,0:0:LowGQ:.:0,0,0 ./.:0,0:0:LowGQ:.:0,0,0
 0/0:1,0:1:PASS:3:.:.:0,3,28    ./.:0,0:0:LowGQ:.:.:.:0,0,0 ./.:0,0:0:LowGQ:.:.:.:0,0,0
 0/0:1,0:1:PASS:3:.:.:0,3,28    ./.:0,0:0:LowGQ:.:.:.:0,0,0 ./.:0,0:0:LowGQ:.:.:.:0,0,0

Could you let me know if the code is going wrong or if i am interpreting it wrong.

ADD REPLY
0
Entering edit mode

I think there is a misunderstanding: if the FILTER==RecessiveJS then the variant has FAILED to match the filter.

ADD REPLY
0
Entering edit mode

Yes, but still the variants under FILTER=AllGtsFiltered;RecessiveJS doesn't match the filter where i have asked the variant to be homozygous for all the 3 samples but the result has mixed genotypes i.e 0/0, ./. and 1/1

ADD REPLY
0
Entering edit mode

I sill don't understand:

i have asked the variant to be homozygous for all the 3 samples but the result has mixed genotypes

so the 3 samples fails your filter, so there is something in the FILTER column. What about the variant having PASS in the filter column ?

ADD REPLY
0
Entering edit mode

I will try to explain in a different way. The genotype for the samples BA1,BA2 and BA3 should be homozygous for the alternate allele i.e. BA1=BA2=BA3=1/1 and BA4 should be heterozygous i.e 0/1 and the genotypes for the remaining samples could be either 0/0 or ./. .

In the above commands cut -f30,31,34 30,31 and 34 represents BA1,BA2 and BA3 but the genotypes are not 1/1 for the 3 samples with either FILTER=RecessiveJS or FILTER=AllGtsFiltered;RecessiveJS . And variants with FILTER=PASS do not exist in the outputfile.

ADD REPLY
0
Entering edit mode

can you please show me grep "#CHROM" -m 1 BC.vcf |cut -f30,31,34

ADD REPLY
0
Entering edit mode
Here it is: grep "#CHROM" -m 1 BC.vcf |cut -f30,31,34
BA1    BA2    BA3
ADD REPLY
0
Entering edit mode

Does the above output help?

ADD REPLY
0
Entering edit mode

It works on a different test case where i have simulated the input to match the criteria specified in javascript. In the above case there seems to be no variant meeting the criteria in the javascript and so there is no variant in the output with flag "PASS".

ADD REPLY

Login before adding your answer.

Traffic: 2463 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6