I wanted to validate my pipeline for SNVs discovery. To do that I've downloaded exome fastq and vcf annotation files for each chromosome from http://www.internationalgenome.org/data-portal/sample/HG00119.
At the end I wanted to compare snps and indels in both results only for exonic sequences downloaded from UCSC database. To my surprise there are a lot of snps annotated by 1000g that I don't see in IGV at all! (and they are not annotated by my pipeline).
In addition I've downloaded BAM files for analysed sample from 1000g database, the same page http://www.internationalgenome.org/data-portal/sample/HG00119. And the same here - a lot of annotated variants not seen in Bam file.
I am very confused... is there other way I can validate my pipeline? Also would you recommend to write an email with this problem to 1000g?
Validating to wrong annotations is useless. Maybe there is a reference that is well checked and I can use it for validation of pipeline?
PS. I've selected vcf only for one sample so I am sure that those annotations are related to that patient. Thanks in advance,