Question: Analysis of exome samples build from different exome kits
gravatar for Nicolas Rosewick
17 months ago by
Belgium, Brussels
Nicolas Rosewick9.3k wrote:

Question regarding germline variant calling from exome :

I've a dataset composed of ~500 exome samples build using 6 differents kits (the dataset was build since ~6-7 years, so as kits evolved the "youngest" samples where build using the more up-to-date kits and the oldest samples with the oldest kits.

As the targeted genomic regions are different for each exome kits which interval file should I use in GATK best practice ( BWA -> MarkDup -> BQSR -> HaplotyeCaller -> GenomicsDBimport -> GenotypeVCF -> VariantRecalibrator ) ? My first idea will be to use the union of all interval files (from each exon kit) but I'm wondering if VQSR part of GATK pipeline will not struggle as all samples will not fit the "union" interval set.

Other idea : for each kit call the variants using the associated samples and interval file. Merge the VCFs after VQSR filtering.

Any advice ? Thanks

Edit :

I open a thread on GATK's forum as it's really specific to this tool :

In a nutshell, I succeed to improve TiTv by running each sample with it's respective interval file (from the corresponging exome kit) ; then used the union of these intervals for steps after Haplotyecaller (joint genotyping and VQSR)

interval gatk exome • 383 views
ADD COMMENTlink modified 17 months ago • written 17 months ago by Nicolas Rosewick9.3k
gravatar for andrew.j.skelton73
17 months ago by
andrew.j.skelton736.1k wrote:

My first inclination would be to make sure that all recommended preprocessing steps are performed with the respective kits from alignment up to gVCF generation. Once you genotype the gVCF files, try out the union or intersect of all kits (be careful with the intersect as some very old kits did not target a particularly large amount of sites. Check out the tranche files for each iteration.

ADD COMMENTlink written 17 months ago by andrew.j.skelton736.1k

Thanks Andrew. I'll try that and post my results as soon as it's finished. Thanks

ADD REPLYlink written 17 months ago by Nicolas Rosewick9.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour