Filtering multisample vcf file by
1
0
Entering edit mode
5.8 years ago

Hello,

I have a vcf file that contains 200 DO mouse samples. I want to filter the file by SNPs that have at least 5 of each genotype per SNP. Each SNP needs at least 5 AA, AB, and BB. For example, if a SNP has190 AA, 6 AB, and 4 BB then this would be discarded. Or if there are 100 AA, 0 AB, and 100 BB, then this will also be discarded. There needs to be 5 or more for each genotype. How would I go about doing this? I have been trying with vcftools, but not quite getting it to work. The rule doesn't have to be exact, I am just trying to filter SNPs that can give me the most information from telling cell lines apart.

Any help would be greatly appreciated.

Thank you

vcftools genotype • 1.8k views
2
Entering edit mode
5.8 years ago

using vcfilterjs:

java -jar dist/vcffilterjs.jar  -e 'function accept(v) {var nAA=0,nBB=0,nAB=0;for(var i=0;i< v.getNSamples();++i) { var g=v.getGenotype(i);if(g.isHomRef()) {nAA++;} else if(g.isHomVar()) { nBB++;} else if(g.isHet()) { nAB++;}} return nAA>5 && nBB>5 && nAB>5;}accept(variant);' input.vcf