Question: Filtering multisample vcf file by
gravatar for devin.porter92
2.7 years ago by
devin.porter920 wrote:


I have a vcf file that contains 200 DO mouse samples. I want to filter the file by SNPs that have at least 5 of each genotype per SNP. Each SNP needs at least 5 AA, AB, and BB. For example, if a SNP has190 AA, 6 AB, and 4 BB then this would be discarded. Or if there are 100 AA, 0 AB, and 100 BB, then this will also be discarded. There needs to be 5 or more for each genotype. How would I go about doing this? I have been trying with vcftools, but not quite getting it to work. The rule doesn't have to be exact, I am just trying to filter SNPs that can give me the most information from telling cell lines apart.

Any help would be greatly appreciated.

Thank you

genotype vcftools • 1.0k views
ADD COMMENTlink modified 2.7 years ago by Pierre Lindenbaum124k • written 2.7 years ago by devin.porter920
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using vcfilterjs:

java -jar dist/vcffilterjs.jar  -e 'function accept(v) {var nAA=0,nBB=0,nAB=0;for(var i=0;i< v.getNSamples();++i) { var g=v.getGenotype(i);if(g.isHomRef()) {nAA++;} else if(g.isHomVar()) { nBB++;} else if(g.isHet()) { nAB++;}} return nAA>5 && nBB>5 && nAB>5;}accept(variant);' input.vcf
ADD COMMENTlink written 2.7 years ago by Pierre Lindenbaum124k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1174 users visited in the last hour