I was testing vcftools to compare variants called from different programs and obtain a consensus from the callers. But the results I'm getting just dont make any sense, thus I would like to know if there is something that I'm doing wrong.
The test consisted in select the last 15 variants from my vcf file with:
tail -n 15 vcffile > tail.vcf
Then call vcftools to compare the original vcffile against its 15 last variants to obtain a vcf file with only those variants as output:
vcftools --vcf complete_vcf.vcf --diff tail.vcf --out tmp --diff-site
The output from vcftools is:
VCFtools - 0.1.15 (C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted: --vcf complete_vcf.vcf --out tmp --diff tail.vcf --diff-site
After filtering, kept 1 out of 1 Individuals Comparing sites in VCF files...
Found 14 sites common to both files.
Found 58055 sites only in main file.
Found 0 sites only in second file.
Found 0 non-matching overlapping sites.
After filtering, kept 58069 out of a possible 58069 Sites
Run Time = 1.00 seconds
And when I check for the number of non comment lines in the tmp.recode.vcf the result is not 15.
grep -cv "#" tmp.diff.sites_in_files 58055
Can someone explain to me why is this happening or what I'm doing wrong?
Or if there is any other tool to obtain that consensus among variants from different callers out there