Question: Opposite genotype filtering in multisample vcf for 2 population
0
gravatar for ck791234
2.9 years ago by
ck7912340
United Kingdom
ck7912340 wrote:

Hi,

I'm trying to filter for opposite genotype from multisample vcf (8 samples) which consist of 2 population (4 samples/population) while maintain the vcf format.

What i want to achieved are as follow:

1. Filter for same genotype in all sample in Population A while genotype in Population B are totally different than Population A.

example: 1/1 in all sample in Population A (s1-s4), none 1/1 in Population B (a1-a4) (genotype in Population B can be a mixture of 0/0 and 0/1).

#desire genotype output 

s1      s2     s3    s4     a1    a2     a3     a4  

1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/0

1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1

and so on..

 

2. Flip the genotype filtering as in (1) where genotype in Population B are same while genotype in Population A is not same as Population B.

#desire genotype output

 s1      s2     s3    s4     a1    a2     a3     a4  

0/0    0/0    0/0    0/0    1/1    1/1    1/1    1/1

0/0    0/0    0/0    0/1    1/1    1/1    1/1    1/1

and so on..

I tried with SnpSift, however the filter missed 26 combination when involve 1/1 genotype.

The command i used as follow:

cat sample.vcf | java -Xmx4g -jar  SnpSift.jar filter  "(((countHom()>3)&(countHom()<5))|(countHet()=4)|(countRef()=4)) &((isRef(GEN[0]) & isRef(GEN[1]) & isRef(GEN[2]) &  isRef(GEN[3]))|(isHom(GEN[0]) & isVariant(GEN[0]) & isHom(GEN[1])& isVariant(GEN[1]) & isHom(GEN[2]) & isVariant(GEN[2])& isHom(GEN[3]) & isVariant(GEN[3])) | (isHet(GEN[0]) & isHet(GEN[1]) & isHet(GEN[2]) & isHet(GEN[3]))| (isRef(GEN[4]) & isRef(GEN[5]) & isRef(GEN[6]) &  isRef(GEN[7]))|(isHom(GEN[4]) & isVariant(GEN[4]) & isHom(GEN[5])& isVariant(GEN[5]) & isHom(GEN[6]) & isVariant(GEN[6])& isHom(GEN[7]) & isVariant(GEN[7])) | (isHet(GEN[4]) & isHet(GEN[5]) & isHet(GEN[6]) & isHet(GEN[7])))"

Does anyone has experience in this filtering?  I  appreciate any help/advise.

Many thanks

snp genotype snpsift vcf • 1.2k views
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by ck7912340

Thanks for all the helps & advise.

Script for the opposite filtering with https://github.com/lindenb/jvarkit/wiki/VCFFilterJS :

function accept(v)
    {
    var pop1=["s1","s2","s3","s4"];
    var pop2=["a1","a2","a3","a4"];
    var i;
    var g1 = v.getGenotype(pop1[0]);
    var g2 = v.getGenotype(pop2[0]);
    var g1issame = 1;
   
    if( !g1.isCalled() ) return false;
    if( !g2.isCalled() ) return false;

    for(i=1;i < pop1.length; ++i)
        {
        if( ! g1.sameGenotype( v.getGenotype(pop1[i]) ) ) g1issame = 0;
        }
       
    if( g1issame == 1 )
        {
        for(i=0;i < pop2.length; ++i)
            {
            if( g1.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
            }
        }
    else
        {
        for(i=1;i < pop2.length;++i)
            {
            if( ! g2.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
            }
        for(i=0;i< pop1.length; ++i)
            {
            if( g2.sameGenotype( v.getGenotype(pop1[i]) ) ) return false;
            }
        }
   
    return true;
    }
accept(variant);

 

 

 

ADD REPLYlink written 2.9 years ago by ck7912340
1
gravatar for Pierre Lindenbaum
2.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

using my tool https://github.com/lindenb/jvarkit/wiki/VCFFilterJS and the following script :

 

function accept(v)
    {
    var pop1=["sample1","sample2","sample3"];
    var pop2=["sample4","sample5"];
    var i,j;
    var g1 = v.getGenotype(pop1[0]);
    var g2 = v.getGenotype(pop2[0]);
    
    if( !g1.isCalled() ) return false;
    if( !g2.isCalled() ) return false;
    if( g1.sameGenotype(g2) ) return false;
    
    for(i=1;i < pop1.length;++i)
        {
        if( ! g1.sameGenotype( v.getGenotype(pop1[i]) ) ) return false;
        }
    for(i=1;i < pop2.length;++i)
        {
        if( ! g2.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
        }
    return true;
    }

accept(variant);

 

 

you can then reformat the resulting vcf using my tool bioalcidae https://github.com/lindenb/jvarkit/wiki/BioAlcidae

 

ADD COMMENTlink written 2.9 years ago by Pierre Lindenbaum115k

Hi Pierre,

Thanks for the help.

The script still missed condition where 2 opposite genotype are in Population B(a1-a4) and vice versa like as exampled as below which i'm interested to keep (even after i set pop1=["s1","s2","s3","s4"]; pop2=["a1","a2","a3","a4"]):

s1      s2     s3    s4     a1    a2     a3     a4 

1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1

1/1    1/1    1/1    1/1    0/0    0/1    0/0    0/1

and so on..

 Also, what is the j refer to in the script (var i,j)?

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by ck7912340
1

> The script still missed condition where 2 opposite genotype

oh I see ,  It would be something like

 

function accept(v)
    {
    var pop1=["sample1","sample2","sample3"];
    var pop2=["sample4","sample5"];
    var i;
    var g1 = v.getGenotype(pop1[0]);
    
    if( !g1.isCalled() ) return false;
    
    for(i=1;i < pop1.length;++i)
        {
        if( ! g1.sameGenotype( v.getGenotype(pop1[i]) ) ) return false;
        }
    for(i=0;i < pop2.length;++i)
        {
        if( g1.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
        }
    return true;
    }

accept(variant);

 

 

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Pierre Lindenbaum115k

The script worked brilliantly to filter for fix genotype in 1 population (eg. 0/0) and opposite mix genotype (0/1 or 1/1 and mix of 0/1 & 1/1) in other population.  

My ultimate aim is to have script that will output following within single script :

s1      s2     s3    s4     a1    a2     a3     a4 

1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1

1/1    1/1    1/1    1/1    0/1    0/0    0/0    0/1

0/0    0/0    0/0    0/1     1/1    1/1    1/1    1/1

0/0    0/1    0/0    0/1     1/1    1/1    1/1    1/1

0/1    0/1    0/1    0/1     1/1    1/1    1/1    0/0

1/1    1/1    1/1    0/0      0/1    0/1    0/1    0/1

and so on ..

I've tried to modified the script but failed.  Can you advise?

Thanks

ADD REPLYlink written 2.9 years ago by ck7912340

j is unused. you can remove it.

ADD REPLYlink written 2.9 years ago by Pierre Lindenbaum115k
0
gravatar for Dan Gaston
2.9 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

Have you checked out GEMINI? It does a little more than you you want in that it also adds all sorts of annotations to your dataset as well, but it stores the VCFs (and can include a PED file) in an SQLite database. It has some built in search tools where I think you could set up the right queries to get what you want. Since you can separate your populations as defined groups or cohorts in the database. 

ADD COMMENTlink written 2.9 years ago by Dan Gaston7.1k

Thanks Dan for the info,

I work with non-human and GEMINI is only for human.

ADD REPLYlink written 2.9 years ago by ck7912340

No problem, there may be a way to "hack" it to work since you aren't really interested in the annotations anyway, just the genotype storage and selection. But it may be easier to take another approach.

ADD REPLYlink written 2.9 years ago by Dan Gaston7.1k
0
gravatar for Dan Gaston
2.9 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

Looking around I think the tool you probably want to use is vcf-contrast from the VCFtools tool set. It is specifically designed for looking for differences between groups in this manner.

ADD COMMENTlink written 2.9 years ago by Dan Gaston7.1k

vcf-constrast still missed condition where 2 opposite genotype are present in Pop.B and vice versa as following:

s1      s2     s3    s4     a1    a2     a3     a4 

1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1

ADD REPLYlink written 2.9 years ago by ck7912340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1384 users visited in the last hour