Hi,
I'm trying to filter for opposite genotype from multisample vcf (8 samples) which consist of 2 population (4 samples/population) while maintain the vcf format.
What i want to achieved are as follow:
1. Filter for same genotype in all sample in Population A while genotype in Population B are totally different than Population A.
example: 1/1 in all sample in Population A (s1-s4), none 1/1 in Population B (a1-a4) (genotype in Population B can be a mixture of 0/0 and 0/1).
#desire genotype output
s1 s2 s3 s4 a1 a2 a3 a4
1/1 1/1 1/1 1/1 0/0 0/0 0/0 0/0
1/1 1/1 1/1 1/1 0/0 0/0 0/0 0/1
and so on..
2. Flip the genotype filtering as in (1) where genotype in Population B are same while genotype in Population A is not same as Population B.
#desire genotype output
s1 s2 s3 s4 a1 a2 a3 a4
0/0 0/0 0/0 0/0 1/1 1/1 1/1 1/1
0/0 0/0 0/0 0/1 1/1 1/1 1/1 1/1
and so on..
I tried with SnpSift, however the filter missed 26 combination when involve 1/1 genotype.
The command i used as follow:
cat sample.vcf | java -Xmx4g -jar SnpSift.jar filter "(((countHom()>3)&(countHom()<5))|(countHet()=4)|(countRef()=4)) &((isRef(GEN[0]) & isRef(GEN[1]) & isRef(GEN[2]) & isRef(GEN[3]))|(isHom(GEN[0]) & isVariant(GEN[0]) & isHom(GEN[1])& isVariant(GEN[1]) & isHom(GEN[2]) & isVariant(GEN[2])& isHom(GEN[3]) & isVariant(GEN[3])) | (isHet(GEN[0]) & isHet(GEN[1]) & isHet(GEN[2]) & isHet(GEN[3]))| (isRef(GEN[4]) & isRef(GEN[5]) & isRef(GEN[6]) & isRef(GEN[7]))|(isHom(GEN[4]) & isVariant(GEN[4]) & isHom(GEN[5])& isVariant(GEN[5]) & isHom(GEN[6]) & isVariant(GEN[6])& isHom(GEN[7]) & isVariant(GEN[7])) | (isHet(GEN[4]) & isHet(GEN[5]) & isHet(GEN[6]) & isHet(GEN[7])))"
Does anyone has experience in this filtering? I appreciate any help/advise.
Many thanks
Thanks for all the helps & advise.
Script for the opposite filtering with https://github.com/lindenb/jvarkit/wiki/VCFFilterJS: