Question: Variant analysis on 1 or 2 samples: are the final steps of the GATK pipeline unneccessary?
gravatar for YaGalbi
14 months ago by
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

Hello all,

I'm a bit confused as to what steps are necessary, and what steps are not going to add much benefit. I have 2 jobs to complete for 2 different research groups we support: 1) Germline short variant discovery on whole exome sequencing (WES) data collected from 1 mouse (1 sample in total), and 2) Germline short variant discovery on whole genome sequencing data (WGS) collected from 2 macaques (2 samples in total).

I have written a wrapper that follows the GATK best practices from fastq preprocessing to haplotypecaller with appropriate conditional loops and required files specific to each species and WGS/WES.

According to the GATK workflow - my next steps after running HaplotypeCaller (with --emit-ref-confidence GVCF) in the pipeline are 1) consolidate GVCFs, 2) Joint-calling cohort, and 3) VQSR ("probably the hardest part of the Best Practices to get right").

Considering I have only 1 or 2 samples, and in species where truth data sets may not be available - is it pointless doing some/all of these steps? Should I just stick to the variants called in each sample by HaplotypeCaller? Should I remove "--emit-ref-confidence GVCF" and just create a regular VCF?

Thank you, Kenneth

ADD COMMENTlink modified 14 months ago • written 14 months ago by YaGalbi1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1496 users visited in the last hour