Question: GATK vs more traditional SNP and alignment tools
gravatar for anjali.gopal91
6.0 years ago by
United States
anjali.gopal9150 wrote:

I've been asked to design an easy-to-use SNP caller at work (presumably for staff who don't know how to use a linux environment and would like the avoid the hassle of such). I've gone about doing this with some fairly traditional tools: bowtie2 for alignment, samtools and bcftools to modify sam files and generate pileups, SNVer for variant calling, etc.

And then I started reading about platforms like GATK that already do this, and thought that it might be better to investigate that as an option instead.

So, now I'm wondering: for those of you who have used GATK, do you prefer it to more 'traditional' alignment and variant calling methods (i.e., ones where you've written and customized most of the script yourself)? Are there any drawbacks to GATK that I should be aware of before investigating it as a primary alignment+variant calling tool? (I realize that GATK works primarily on a linux env, but I shouldn't have a problem creating an external GUI to be able to control some of its features.)

Any feedback would be great! Thank you.

snp alignment • 2.6k views
ADD COMMENTlink modified 6.0 years ago by marina.v.yurieva520 • written 6.0 years ago by anjali.gopal9150

Why not just setup a galaxy pipeline for them? BTW, if you want real data on variant caller comparisons, have a read through Brad Chapman's blog.

ADD REPLYlink written 6.0 years ago by Devon Ryan98k
Yup, it seems like there is only a need for a GUI, not for a new variant caller.
ADD REPLYlink written 6.0 years ago by Irsan7.2k
gravatar for Sean Davis
6.0 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

GATK is a set of tools for working with sequencing data.  It does NOT include an aligner.  It does include several tools (along with picard) for post-processing BAM files prior to variant calling.  It includes two variant callers, UnifiedGenotyper and HaplotypeCaller, with HaplotypeCaller being the recommended one.  Finally, GATK includes tools for post-processing VCF files.  Many groups routinely use parts of GATK/picard in their pipelines, so you should definitely investigate it and probably incorporate parts of it into your pipeline.

As Devon suggests, it is probably a good idea to get a sense of the validity of various pipelines from the literature, blogs, and any other resources you can get your hands on.  However, at the end-of-the-day, there is not a one-size-fits-all solution, so you'll need to define what your goals are (ease-of-use, validity, speed, sample size, etc.) and then define a pipeline that meets those goals.  

ADD COMMENTlink written 6.0 years ago by Sean Davis26k

Thanks! I missed that GATK can't do general alignment. The remaining description helps a lot :)

ADD REPLYlink written 6.0 years ago by anjali.gopal9150
HaplotypeCaller should only be used for high coverage data. Use UnifiedGenotyper, samtools or FreeBayes for low coverage SNP calling instead.
ADD REPLYlink written 6.0 years ago by Tommy Carstensen210
gravatar for Charles Warden
6.0 years ago by
Charles Warden8.0k
Duarte, CA
Charles Warden8.0k wrote:

I agree that you should try to use GATK for variant calling.  If it helps, here is a paper with some benchmarks (applied to targeted sequencing experiments):

There are also other published benchmarks, and you can use the citations from that paper to help find some of them.

ADD COMMENTlink written 6.0 years ago by Charles Warden8.0k
gravatar for marina.v.yurieva
6.0 years ago by
Farmington, CT
marina.v.yurieva520 wrote:

The big advantage of GATK is that it does recalibration and realignment but you don't have to use its caller. If you work with non-human data then you can use something "traditional" like sam/bcftools for variant calling: GATK callers are more trained for the human data if I understand it right. 

ADD COMMENTlink written 6.0 years ago by marina.v.yurieva520
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2539 users visited in the last hour