Question: Variant caller comparison for non-human data (highly varied populations)
gravatar for hermathena
13 months ago by
United Kingdom
hermathena40 wrote:


Is anyone aware of a recent comparison of various variant callers (GATK, FreeBayes, etc) for non-model organisms, please? There are many out there for human data, obviously because there are good reference sets. My data is hundreds of whole genomes from an insect species (>10% sites variable!), and we traditionally use GATK. However, the GATK HaplotypeCaller is rather slow for this data. Sensitivity is a higher concern than precision (not looking for specific SNP associations).

ADD COMMENTlink written 13 months ago by hermathena40

Just FYI, the Broad is about to release GATK4, stating that notably improvements in speed were made. Maybe it is worth trying the beta-release of GATK4 and see if it performs well for your task?

ADD REPLYlink written 13 months ago by ATpoint10k

Thanks for this. I have experimented wth GATK v4 Beta. There are some gains in speed through multithreading. Unfortunately, there are also many bugs that crop up. Broad is recommending not using GATK4 with Spark for now. That horrible Queue parallelisation is gone, but now you need to use something called GenomicsDBImport to merge gVCFs - and that needs to operate separately on each scaffold... For the time being one may as well use GATK3. There is still the benefit of the InDel model.

ADD REPLYlink written 12 months ago by hermathena40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour