Question: differences between Illumina Platinum Variant Calls and NIST variant calls
1
gravatar for kspata
12 months ago by
kspata70
Chicago
kspata70 wrote:

Hi All,

I am validating a bioinformatics pipeline for SNP and INDEL calling. For this purpose I mapped the reads from Illumina Platinum Genome (https://www.ebi.ac.uk/ena/data/view/ERR194147) to hg38 assembly (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/) and called variants on a smaller subset region chr19:29,207,790-29,217,448. To verify the detection of these variants I used two data sets

  1. From Illumina Platinum Genome (ftp://platgene_ro@ussd-ftp.illumina.com/2017-1.0/hg38/hybrid/)
  2. From NIST (ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/latest/GRCh38/)

There is one variant at location chr19 :29215367 which is detected by variant calling pipeline at frequency > 50%. This variant is present in NIST variant dataset but not in illumina platinum genome.

This variant is also present in the CRAM (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/illumina_platinum_pedigree/data/CEU/NA12878/alignment/) file downloaded from Illumina PG site at frequency 54% as visualized by IGV. variant-in-igv

Should I use NIST variants instead of Illumina PG, as this variant will be shown as False Positive.

Can I merge both these dataets to get more comprehensive variant call. Will it be advisable to merge the two datasets?

I will appreciate an insight into this.

snp vcf • 310 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by kspata70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1429 users visited in the last hour