Question

differences between Illumina Platinum Variant Calls and NIST variant calls

1

Entering edit mode

4.8 years ago

kspata ▴ 80

Hi All,

I am validating a bioinformatics pipeline for SNP and INDEL calling. For this purpose I mapped the reads from Illumina Platinum Genome (https://www.ebi.ac.uk/ena/data/view/ERR194147) to hg38 assembly (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/) and called variants on a smaller subset region chr19:29,207,790-29,217,448. To verify the detection of these variants I used two data sets

From Illumina Platinum Genome (ftp://platgene_ro@ussd-ftp.illumina.com/2017-1.0/hg38/hybrid/)
From NIST (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh38/)

There is one variant at location chr19 :29215367 which is detected by variant calling pipeline at frequency > 50%. This variant is present in NIST variant dataset but not in illumina platinum genome.

This variant is also present in the CRAM (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/illumina_platinum_pedigree/data/CEU/NA12878/alignment/) file downloaded from Illumina PG site at frequency 54% as visualized by IGV.

Should I use NIST variants instead of Illumina PG, as this variant will be shown as False Positive.

Can I merge both these dataets to get more comprehensive variant call. Will it be advisable to merge the two datasets?

I will appreciate an insight into this.

SNP vcf • 1.2k views

ADD COMMENT • link updated 2.6 years ago by goldberry88 • 0 • written 4.8 years ago by kspata ▴ 80

0

Entering edit mode

Running into similar issues here; Did you ever get this solved..?

ADD REPLY • link 2.6 years ago by goldberry88 • 0