differences between Illumina Platinum Variant Calls and NIST variant calls
0
1
Entering edit mode
2.3 years ago
kspata ▴ 70

Hi All,

I am validating a bioinformatics pipeline for SNP and INDEL calling. For this purpose I mapped the reads from Illumina Platinum Genome (https://www.ebi.ac.uk/ena/data/view/ERR194147) to hg38 assembly (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/) and called variants on a smaller subset region chr19:29,207,790-29,217,448. To verify the detection of these variants I used two data sets

  1. From Illumina Platinum Genome (ftp://platgene_ro@ussd-ftp.illumina.com/2017-1.0/hg38/hybrid/)
  2. From NIST (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh38/)

There is one variant at location chr19 :29215367 which is detected by variant calling pipeline at frequency > 50%. This variant is present in NIST variant dataset but not in illumina platinum genome.

This variant is also present in the CRAM (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/illumina_platinum_pedigree/data/CEU/NA12878/alignment/) file downloaded from Illumina PG site at frequency 54% as visualized by IGV. variant-in-igv

Should I use NIST variants instead of Illumina PG, as this variant will be shown as False Positive.

Can I merge both these dataets to get more comprehensive variant call. Will it be advisable to merge the two datasets?

I will appreciate an insight into this.

SNP vcf • 596 views
ADD COMMENT
0
Entering edit mode

Running into similar issues here; Did you ever get this solved..?

ADD REPLY

Login before adding your answer.

Traffic: 1942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6