differences between Illumina Platinum Variant Calls and NIST variant calls
Entering edit mode
2.3 years ago
kspata ▴ 70

Hi All,

I am validating a bioinformatics pipeline for SNP and INDEL calling. For this purpose I mapped the reads from Illumina Platinum Genome (https://www.ebi.ac.uk/ena/data/view/ERR194147) to hg38 assembly (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/) and called variants on a smaller subset region chr19:29,207,790-29,217,448. To verify the detection of these variants I used two data sets

  1. From Illumina Platinum Genome (ftp://platgene_ro@ussd-ftp.illumina.com/2017-1.0/hg38/hybrid/)
  2. From NIST (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh38/)

There is one variant at location chr19 :29215367 which is detected by variant calling pipeline at frequency > 50%. This variant is present in NIST variant dataset but not in illumina platinum genome.

This variant is also present in the CRAM (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/illumina_platinum_pedigree/data/CEU/NA12878/alignment/) file downloaded from Illumina PG site at frequency 54% as visualized by IGV. variant-in-igv

Should I use NIST variants instead of Illumina PG, as this variant will be shown as False Positive.

Can I merge both these dataets to get more comprehensive variant call. Will it be advisable to merge the two datasets?

I will appreciate an insight into this.

SNP vcf • 596 views
Entering edit mode

Running into similar issues here; Did you ever get this solved..?


Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6