Question: Comparing VCF with dbsnp
0
gravatar for manojkumarbioinfo
4.7 years ago by
India
manojkumarbioinfo60 wrote:

Hi,

I want to compare my vcf file with dbsnp so i have used RTG tools for comparing my results while running it i have encountered the error

Error: Record did not contain enough samples: 1 69552 rs55874132 G C. CFL;HD;OTHERKG;REF;RS=55874132;RSPOS=69552;S3D;SAO=0;SSR=0;SYN;VC=SNV;VP=0x050200000309000402000100;WGT=1;dbSNPBuildID=129

Command used for comparing ./Tools/rtg-tools-3.6.1/rtg vcfeval --all-records -b dbsnp_sort.vcf.gz -c Gatk_bowtie_sure_select.vcf.gz -T 3 -t Reference/RTG/HG37 -o Gatk

while comparing using vcf-compare

I got the results like this but i dont know to interpret my data can any one help me with these errors

Results from vcf-compare

This file was generated by vcf-compare. The command line vcf-compare dbsnp_sort.vcf.gz Gatk_bowtie_sure_select.vcf.gz

'Venn-Diagram Numbers'. Use grep ^VN | cut -f 2- to extract this part.

VN The columns are:

VN 1 .. number of sites unique to this particular combination of files

VN 2- .. combination of files and space-separated number, a fraction of sites in the file

VN 4423 Gatk_bowtie_sure_select.vcf.gz (11.6%)

VN 33680 Gatk_bowtie_sure_select.vcf.gz (88.4%) dbsnp_sort.vcf.gz (14.0%)

VN 206183 dbsnp_sort.vcf.gz (86.0%)

SN Summary Numbers. Use grep ^SN | cut -f 2- to extract this part.

SN Number of REF matches: 33394

SN Number of ALT matches: 32073

SN Number of REF mismatches: 286

SN Number of ALT mismatches: 1321

SN Number of samples in GT comparison: 0

Number of sites lost due to grouping (e.g. duplicate sites): lost, %lost, read, reported, file

SN Number of lost sites: 1573 0.7% 241436 239863 dbsnp_sort.vcf.gz

SN Number of lost sites: 2 0.0% 38105 38103 Gatk_bowtie_sure_select.vcf.g

rtg next-gen dbsnp vcf • 2.1k views
ADD COMMENTlink modified 4.4 years ago by Len Trigg1.5k • written 4.7 years ago by manojkumarbioinfo60
3
gravatar for Len Trigg
4.4 years ago by
Len Trigg1.5k
New Zealand
Len Trigg1.5k wrote:

rtg vcfeval performs a pairwise comparison of the (usually diploid) haplotypes asserted by the GT field in the sample column of your VCF. In your case, are supplying a VCF that does not contain a sample column.

Comparing against a "database-style" VCF like dbSNP is something that will be added to vcfeval in the future. For now, you could add a synthetic sample to your dbSNP VCF that includes a GT field with a value of 1 (i.e. referring to the ALT allele), and then run vcfeval with --squash-ploidy to tell it to do a haploid comparison only when it compares against your calls.

(Edit: since 3.7 vcfeval now supports comparison against database-style VCFs, using the special sample identifier "ALT")

ADD COMMENTlink modified 3.6 years ago • written 4.4 years ago by Len Trigg1.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1892 users visited in the last hour