Question

Variant analysis on 1 or 2 samples: are the final steps of the GATK pipeline unneccessary?

0

Entering edit mode

5.9 years ago

BioinfGuru ★ 1.7k

Hello all,

I'm a bit confused as to what steps are necessary, and what steps are not going to add much benefit. I have 2 jobs to complete for 2 different research groups we support: 1) Germline short variant discovery on whole exome sequencing (WES) data collected from 1 mouse (1 sample in total), and 2) Germline short variant discovery on whole genome sequencing data (WGS) collected from 2 macaques (2 samples in total).

I have written a wrapper that follows the GATK best practices from fastq preprocessing to haplotypecaller with appropriate conditional loops and required files specific to each species and WGS/WES.

According to the GATK workflow - my next steps after running HaplotypeCaller (with --emit-ref-confidence GVCF) in the pipeline are 1) consolidate GVCFs, 2) Joint-calling cohort, and 3) VQSR ("probably the hardest part of the Best Practices to get right").

Considering I have only 1 or 2 samples, and in species where truth data sets may not be available - is it pointless doing some/all of these steps? Should I just stick to the variants called in each sample by HaplotypeCaller? Should I remove "--emit-ref-confidence GVCF" and just create a regular VCF?

Thank you, Kenneth

SNP GATK pipeline variant analysis • 1.6k views

ADD COMMENT • link 5.9 years ago by BioinfGuru ★ 1.7k