Opposite genotype filtering in multisample vcf for 2 population
3
0
Entering edit mode
8.3 years ago
ck791234 • 0

Hi,

I'm trying to filter for opposite genotype from multisample vcf (8 samples) which consist of 2 population (4 samples/population) while maintain the vcf format.

What I want to achieved are as follow:

1. Filter for same genotype in all sample in Population A while genotype in Population B are totally different than Population A.

Example: 1/1 in all sample in Population A (s1-s4), none 1/1 in Population B (a1-a4) (genotype in Population B can be a mixture of 0/0 and 0/1).

Desired genotype output

s1      s2     s3    s4     a1    a2     a3     a4  
1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/0
1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1
..

2. Flip the genotype filtering as in (1) where genotype in Population B are same while genotype in Population A is not same as Population B.

Desired genotype output

 s1      s2     s3    s4     a1    a2     a3     a4
0/0    0/0    0/0    0/0    1/1    1/1    1/1    1/1
0/0    0/0    0/0    0/1    1/1    1/1    1/1    1/1
..

I tried with SnpSift, however the filter missed 26 combination when involve 1/1 genotype.

The command I used as follow:

cat sample.vcf | \
  java -Xmx4g -jar SnpSift.jar filter \
    "(
      ((countHom()>3) & (countHom()<5)) | 
      (countHet()=4) | 
      (countRef()=4)) & 
      (
        (isRef(GEN[0]) & isRef(GEN[1]) & isRef(GEN[2]) &  isRef(GEN[3])
      ) | 
      (isHom(GEN[0]) & isVariant(GEN[0]) & isHom(GEN[1])& isVariant(GEN[1]) & isHom(GEN[2]) & isVariant(GEN[2])& isHom(GEN[3]) & isVariant(GEN[3])) | 
      (isHet(GEN[0]) & isHet(GEN[1]) & isHet(GEN[2]) & isHet(GEN[3]))| (isRef(GEN[4]) & isRef(GEN[5]) & isRef(GEN[6]) &  isRef(GEN[7])) | 
      (isHom(GEN[4]) & isVariant(GEN[4]) & isHom(GEN[5])& isVariant(GEN[5]) & isHom(GEN[6]) & isVariant(GEN[6])& isHom(GEN[7]) & isVariant(GEN[7])) | 
      (isHet(GEN[4]) & isHet(GEN[5]) & isHet(GEN[6]) & isHet(GEN[7]))
    )"

Does anyone has experience in this filtering? I appreciate any help/advise.

Many thanks

vcf snpsift SNP genotype • 2.9k views
ADD COMMENT
0
Entering edit mode

Thanks for all the helps & advise.

Script for the opposite filtering with https://github.com/lindenb/jvarkit/wiki/VCFFilterJS:

function accept(v)
    {
    var pop1=["s1","s2","s3","s4"];
    var pop2=["a1","a2","a3","a4"];
    var i;
    var g1 = v.getGenotype(pop1[0]);
    var g2 = v.getGenotype(pop2[0]);
    var g1issame = 1;

    if( !g1.isCalled() ) return false;
    if( !g2.isCalled() ) return false;

    for(i=1;i < pop1.length; ++i)
        {
        if( ! g1.sameGenotype( v.getGenotype(pop1[i]) ) ) g1issame = 0;
        }

    if( g1issame == 1 )
        {
        for(i=0;i < pop2.length; ++i)
            {
            if( g1.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
            }
        }
    else
        {
        for(i=1;i < pop2.length;++i)
            {
            if( ! g2.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
            }
        for(i=0;i< pop1.length; ++i)
            {
            if( g2.sameGenotype( v.getGenotype(pop1[i]) ) ) return false;
            }
        }

    return true;
    }
accept(variant);
ADD REPLY
1
Entering edit mode
8.3 years ago

using my tool https://github.com/lindenb/jvarkit/wiki/VCFFilterJS and the following script:

function accept(v)
    {
    var pop1=["sample1","sample2","sample3"];
    var pop2=["sample4","sample5"];
    var i,j;
    var g1 = v.getGenotype(pop1[0]);
    var g2 = v.getGenotype(pop2[0]);

    if( !g1.isCalled() ) return false;
    if( !g2.isCalled() ) return false;
    if( g1.sameGenotype(g2) ) return false;

    for(i=1;i < pop1.length;++i)
        {
        if( ! g1.sameGenotype( v.getGenotype(pop1[i]) ) ) return false;
        }
    for(i=1;i < pop2.length;++i)
        {
        if( ! g2.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
        }
    return true;
    }

accept(variant);

you can then reformat the resulting vcf using my tool bioalcidae https://github.com/lindenb/jvarkit/wiki/BioAlcidae

ADD COMMENT
0
Entering edit mode

Hi Pierre,

Thanks for the help.

The script still missed condition where 2 opposite genotype are in Population B(a1-a4) and vice versa like as exampled as below which I'm interested to keep (even after I set pop1=["s1","s2","s3","s4"]; pop2=["a1","a2","a3","a4"]):

s1      s2     s3    s4     a1    a2     a3     a4 
1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1
1/1    1/1    1/1    1/1    0/0    0/1    0/0    0/1

and so on..

Also, what is the j refer to in the script (var i,j)?

ADD REPLY
1
Entering edit mode

The script still missed condition where 2 opposite genotype

oh I see, It would be something like

function accept(v)
    {
    var pop1=["sample1","sample2","sample3"];
    var pop2=["sample4","sample5"];
    var i;
    var g1 = v.getGenotype(pop1[0]);

    if( !g1.isCalled() ) return false;

    for(i=1;i < pop1.length;++i)
        {
        if( ! g1.sameGenotype( v.getGenotype(pop1[i]) ) ) return false;
        }
    for(i=0;i < pop2.length;++i)
        {
        if( g1.sameGenotype( v.getGenotype(pop2[i]) ) ) return false;
        }
    return true;
    }

accept(variant);
ADD REPLY
0
Entering edit mode

The script worked brilliantly to filter for fix genotype in 1 population (eg. 0/0) and opposite mix genotype (0/1 or 1/1 and mix of 0/1 & 1/1) in other population.

My ultimate aim is to have script that will output following within single script :

s1      s2     s3    s4     a1    a2     a3     a4 
1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1
1/1    1/1    1/1    1/1    0/1    0/0    0/0    0/1
0/0    0/0    0/0    0/1     1/1    1/1    1/1    1/1
0/0    0/1    0/0    0/1     1/1    1/1    1/1    1/1
0/1    0/1    0/1    0/1     1/1    1/1    1/1    0/0
1/1    1/1    1/1    0/0      0/1    0/1    0/1    0/1

and so on ..

I've tried to modified the script but failed. Can you advise?

Thanks

ADD REPLY
0
Entering edit mode

j is unused. you can remove it.

ADD REPLY
0
Entering edit mode
8.3 years ago
DG 7.3k

Have you checked out GEMINI? It does a little more than you you want in that it also adds all sorts of annotations to your dataset as well, but it stores the VCFs (and can include a PED file) in an SQLite database. It has some built in search tools where I think you could set up the right queries to get what you want. Since you can separate your populations as defined groups or cohorts in the database.

ADD COMMENT
0
Entering edit mode

Thanks Dan for the info,

I work with non-human and GEMINI is only for human.

ADD REPLY
0
Entering edit mode

No problem, there may be a way to "hack" it to work since you aren't really interested in the annotations anyway, just the genotype storage and selection. But it may be easier to take another approach.

ADD REPLY
0
Entering edit mode
8.3 years ago
DG 7.3k

Looking around I think the tool you probably want to use is vcf-contrast from the VCFtools tool set. It is specifically designed for looking for differences between groups in this manner.

ADD COMMENT
0
Entering edit mode

vcf-constrast still missed condition where 2 opposite genotype are present in Pop.B and vice versa as following:

s1      s2     s3    s4     a1    a2     a3     a4 
1/1    1/1    1/1    1/1    0/0    0/0    0/0    0/1
ADD REPLY

Login before adding your answer.

Traffic: 1804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6