bcftools to GATK
2
0
Entering edit mode
6.1 years ago
GK1610 ▴ 110

I removed indels using bcftools in 2 gvcf files. when i try to merge them using combineGVCFs/gatk, i see this error

can anyone help me, how to fix this?

INFO  19:05:26,829 ProgressMeter -  chr22:16051001       1.6E7    30.0 s       1.0 s       53.6%    56.0 s      26.0 s
##### ERROR --
##### ERROR stack trace
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
    at java.lang.Integer.compareTo(Integer.java:52)
    at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
    at java.util.Arrays.sort(Arrays.java:1312)
    at java.util.Arrays.sort(Arrays.java:1506)
    at java.util.ArrayList.sort(ArrayList.java:1454)
    at java.util.Collections.sort(Collections.java:141)
    at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:999)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:83)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:194)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.endPreviousStates(CombineGVCFs.java:365)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:253)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:115)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:291)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:280)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
snp • 2.5k views
ADD COMMENT
1
Entering edit mode

You should minimise the mixing of GATK with tools from the 'other' main germline variant calling pipeline, i.e., BWA, SAMtools, and BCFtools. They are incompatible in certain areas.

A good pipeline that will give you what you want:

  1. Alignment (bwa)
  2. Remove duplicates (Picard)
  3. Local re-alignment (GATK IndelRealigner, etc)
  4. BQSR (GATK)
  5. Variant calling per file (GATK HaplotypeCaller)
  6. Produce gVCF (GATK GenotypeGVCF)
  7. Separate out SNVs (remove indels (GATK SelectVariants)
ADD REPLY
1
Entering edit mode
6.1 years ago
aays ▴ 180

On the surface, this very much seems like a bug - I'm not too familiar with the inner workings of GATK, but the final bit of that error message is telling me that some kind of object coercion is going on when it shouldn't be. However, this may also have to do with the fact that (iirc) CombineGVCFs can only accept gVCFs generated from a GATK tool like HaplotypeCaller, and so the removal of indels with a non-GATK tool might be causing this error.

From the GATK forums:

The underlying problem is a lack of consensus in the field as to what constitutes a gvcf. By the standards of samtools and some Illumina-derived files we've seen, a gvcf is essentially an all-sites VCF, potentially with non-variant block records. Within the scope of GATK, gvcf means much more than that -- it includes a symbolic non-ref allele and metrics that relate to the confidence level of both reference and non-ref calls. As a result, the GATK tools that process gvcfs are only able to handle GATK-derived gvcfs, because those are the only ones that fulfill the expectations regarding that extra information.

ADD COMMENT
0
Entering edit mode

Yes I get that. My problem is I want to remove indels from my gvcf files before I start combinGVCFs to merge GVCF files.

I am using this command. java -Xmx2g -jar $GenomeAnalysisTK_jar SelectVariants -R $REF -V $file -O output.vcf.gz --select-type-to-exclude INDEL

Unfortunately it does nothing. when I grep

zcat output.vcf.gz | grep AAAAAACAAC

chr22 31470782 . AAAAAACAAC A,<non_ref> 0.60 . BaseQRankSum=-0.736;ClippingRankSum=0.736;DP=3;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.736;RAW_MQ=10800.00;ReadPosRankSum=0.736 GT:AD:DP:GQ:PL:SB 0/1:2,1,0:3:29:29,0,46,35,49,84:1,1,0,1

This shouldn't happen!!!

bcftools removes indels nicely which is why I had to detour from GATK to bcftools. I don't know what s going on.

ADD REPLY
0
Entering edit mode

What about combining gvcfs first, and then filtering indels from the obtained vcf?

ADD REPLY
0
Entering edit mode

you are right I have like ~240 samples I combined 20 samples first and created 12 meta-merged files for each chromosome

and when I remerge these meta-merge files. I get this error of java.lang.IllegalArgumentException: Unexpected base in allele bases '*AAAAAAAAC'

https://gatkforums.broadinstitute.org/gatk/discussion/11693/combine-gvcfs-error#latest

ADD REPLY
1
Entering edit mode
6.1 years ago

removed indels using bcftools in 2 gvcf files

Well there is your problem. Why would you do that? I don't think there is a good enough reason to change gvcf files.

ADD COMMENT

Login before adding your answer.

Traffic: 2025 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6