Question: differences between Illumina Platinum Variant Calls and NIST variant calls
I am validating a bioinformatics pipeline for SNP and INDEL calling. For this purpose I mapped the reads from Illumina Platinum Genome ( to hg38 assembly ( and called variants on a smaller subset region chr19:29,207,790-29,217,448. To verify the detection of these variants I used two data sets

  1. From Illumina Platinum Genome (
  2. From NIST (

There is one variant at location chr19 :29215367 which is detected by variant calling pipeline at frequency > 50%. This variant is present in NIST variant dataset but not in illumina platinum genome.

This variant is also present in the CRAM ( file downloaded from Illumina PG site at frequency 54% as visualized by IGV. variant-in-igv

Should I use NIST variants instead of Illumina PG, as this variant will be shown as False Positive.

Can I merge both these dataets to get more comprehensive variant call. Will it be advisable to merge the two datasets?

I will appreciate an insight into this.

