Dear all,
I try to get an subset of SNPs common for a three of six individuals from a multiVCF file. My problem is that I get different results when I do it
- manually (extract genotypes using "snpsift extractFields" and filter the variants by the excel)
when I use "snpsift filter "
SnpSift varType snps_results_dir/my_multiVCF_SNPs.vcf | SnpSift filter "isVariant( GEN[1]) & isVariant( GEN[2]) & isVariant( GEN[3]) & isRef( GEN[4]) & isRef( GEN[5]) & isRef( GEN[6]) & isRef( GEN[7]) & isRef( GEN[8])" > my_SNP_subset.vcf
With the first method I get 479 SNPs, however "snpsift filter" (second method) gives me about 250 SNPs.
So I'm confused. What is the right method/result? Could somebody help me with this question/discrepance? Are there any other standard procedure to filter the variants from the multiVCF file?
Thank you very much in advance
Kind regards
Pavlo
Excel filters are not reproducible (and are susceptible to data import as well as manual errors) , so we cannot really help you with why you're seeing different results.
If you could use R instead of Excel to apply these filters, the script would help us determine what is behind the discrepancy.
Thank you very much for the tip. I'll try it now.