Question: Variant analysis on 1 or 2 samples: are the final steps of the GATK pipeline unneccessary?
gravatar for YaGalbi
9 months ago by
Biocomputing, MRC Harwell Institute, Oxford, UK
YaGalbi1.4k wrote:

Hello all,

I'm a bit confused as to what steps are necessary, and what steps are not going to add much benefit. I have 2 jobs to complete for 2 different research groups we support: 1) Germline short variant discovery on whole exome sequencing (WES) data collected from 1 mouse (1 sample in total), and 2) Germline short variant discovery on whole genome sequencing data (WGS) collected from 2 macaques (2 samples in total).

I have written a wrapper that follows the GATK best practices from fastq preprocessing to haplotypecaller with appropriate conditional loops and required files specific to each species and WGS/WES.

According to the GATK workflow - my next steps after running HaplotypeCaller (with --emit-ref-confidence GVCF) in the pipeline are 1) consolidate GVCFs, 2) Joint-calling cohort, and 3) VQSR ("probably the hardest part of the Best Practices to get right").

Considering I have only 1 or 2 samples, and in species where truth data sets may not be available - is it pointless doing some/all of these steps? Should I just stick to the variants called in each sample by HaplotypeCaller? Should I remove "--emit-ref-confidence GVCF" and just create a regular VCF?

Thank you, Kenneth

ADD COMMENTlink modified 9 months ago • written 9 months ago by YaGalbi1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1222 users visited in the last hour