Question: Comparing VCF with dbsnp
4.7 years ago
manojkumarbioinfo60 wrote:


I want to compare my vcf file with dbsnp so i have used RTG tools for comparing my results while running it i have encountered the error

Error: Record did not contain enough samples: 1 69552 rs55874132 G C. CFL;HD;OTHERKG;REF;RS=55874132;RSPOS=69552;S3D;SAO=0;SSR=0;SYN;VC=SNV;VP=0x050200000309000402000100;WGT=1;dbSNPBuildID=129

Command used for comparing ./Tools/rtg-tools-3.6.1/rtg vcfeval --all-records -b dbsnp_sort.vcf.gz -c Gatk_bowtie_sure_select.vcf.gz -T 3 -t Reference/RTG/HG37 -o Gatk

while comparing using vcf-compare

I got the results like this but i dont know to interpret my data can any one help me with these errors

Results from vcf-compare

This file was generated by vcf-compare. The command line vcf-compare dbsnp_sort.vcf.gz Gatk_bowtie_sure_select.vcf.gz

'Venn-Diagram Numbers'. Use grep ^VN | cut -f 2- to extract this part.

VN The columns are:

VN 1 .. number of sites unique to this particular combination of files

VN 2- .. combination of files and space-separated number, a fraction of sites in the file

VN 4423 Gatk_bowtie_sure_select.vcf.gz (11.6%)

VN 33680 Gatk_bowtie_sure_select.vcf.gz (88.4%) dbsnp_sort.vcf.gz (14.0%)

VN 206183 dbsnp_sort.vcf.gz (86.0%)

SN Summary Numbers. Use grep ^SN | cut -f 2- to extract this part.

SN Number of REF matches: 33394

SN Number of ALT matches: 32073

SN Number of REF mismatches: 286

SN Number of ALT mismatches: 1321

SN Number of samples in GT comparison: 0

Number of sites lost due to grouping (e.g. duplicate sites): lost, %lost, read, reported, file

SN Number of lost sites: 1573 0.7% 241436 239863 dbsnp_sort.vcf.gz

SN Number of lost sites: 2 0.0% 38105 38103 Gatk_bowtie_sure_select.vcf.g

4.4 years ago
Len Trigg
New Zealand
Len Trigg wrote:

rtg vcfeval performs a pairwise comparison of the (usually diploid) haplotypes asserted by the GT field in the sample column of your VCF. In your case, are supplying a VCF that does not contain a sample column.

Comparing against a "database-style" VCF like dbSNP is something that will be added to vcfeval in the future. For now, you could add a synthetic sample to your dbSNP VCF that includes a GT field with a value of 1 (i.e. referring to the ALT allele), and then run vcfeval with --squash-ploidy to tell it to do a haploid comparison only when it compares against your calls.

(Edit: since 3.7 vcfeval now supports comparison against database-style VCFs, using the special sample identifier "ALT")

