Using GIAB exome data to test targetome variant-calling pipeline
0
0
Entering edit mode
8.2 years ago
Floydian_slip ▴ 170

Hi,

I am trying to test my variant calling pipeline that I have prepared for my target region of 5Mb for which I need a test dataset where the fastq files and the VCF files are provided so that I can run my pipeline on those fastqs and then compare my VCF to those VCFs. So, I used the two whole exome datasets from Genome In a Bottle data with the following steps:

  1. In order to obtain the fastqs that originated from my target region, I used the bam files published by GIAB and extracted the alignments falling in my target region using samtools

  2. Converted the extracted bams in those regions to paired-end fastqs

  3. Aligned those fastqs to the entire genomeusing BWA

  4. Called the variants in my region of interest using GATK (with -L option, restricting my analysis to my target region for RealignerTargetCreator and HaplotypeCaller).

  5. Compared my variants to the variants from GIAB in the target region.

But I only get 50% variant calls correctly. I call only 50% of the total variants in that region. What could be the reason for this low agreement rate?

Interestingly, when I call the variants on the whole exome for which the fastqs were originally created, I can obtain 96% of variants in the whole-exome region published by GIAB. Moreover, from those variants, when I extract the variants that fall in my target regions and compare it to the corresponding GIAB variants, I can go upto 97%. In other words, when I use the entire whole-exome data, I can call 97% of the variants in my target region, but when I start with the alignments falling in my target region, convert to fastq and then call variants in the target region, I only get 50%.

Can somebody help me figure out what could be going wrong?

Thanks,

~N

whole-exome targetome variant-calling GIAB • 2.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 1507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6