Problem with vcftools comparison, results dont make sense
1
0
Entering edit mode
5.1 years ago
rafa.rios.50 ▴ 60

I was testing vcftools to compare variants called from different programs and obtain a consensus from the callers. But the results I'm getting just dont make any sense, thus I would like to know if there is something that I'm doing wrong.

The test consisted in select the last 15 variants from my vcf file with:

tail -n 15 vcffile > tail.vcf


Then call vcftools to compare the original vcffile against its 15 last variants to obtain a vcf file with only those variants as output:

vcftools --vcf complete_vcf.vcf --diff tail.vcf --out tmp --diff-site


The output from vcftools is:

VCFtools - 0.1.15 (C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted: --vcf complete_vcf.vcf --out tmp --diff tail.vcf --diff-site

After filtering, kept 1 out of 1 Individuals Comparing sites in VCF files...

Found 14 sites common to both files.

Found 58055 sites only in main file.

Found 0 sites only in second file.

Found 0 non-matching overlapping sites.

After filtering, kept 58069 out of a possible 58069 Sites

Run Time = 1.00 seconds

And when I check for the number of non comment lines in the tmp.recode.vcf the result is not 15.

grep -cv "#" tmp.diff.sites_in_files
58055


Can someone explain to me why is this happening or what I'm doing wrong?

Or if there is any other tool to obtain that consensus among variants from different callers out there

Thanks

SNP vcftools comparison • 1.8k views
0
Entering edit mode
5.1 years ago
Ram 36k

Your second file is not a VCF file - it needs a header.

Try copying all the ^[#] lines from your principal VCF file and then compare, you should see results that make sense.