Question: bcftools to GATK
0
gravatar for GK1610
2.5 years ago by
GK161080
United States
GK161080 wrote:

I removed indels using bcftools in 2 gvcf files. when i try to merge them using combineGVCFs/gatk, i see this error

can anyone help me, how to fix this?

INFO  19:05:26,829 ProgressMeter -  chr22:16051001       1.6E7    30.0 s       1.0 s       53.6%    56.0 s      26.0 s
##### ERROR --
##### ERROR stack trace
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
    at java.lang.Integer.compareTo(Integer.java:52)
    at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
    at java.util.Arrays.sort(Arrays.java:1312)
    at java.util.Arrays.sort(Arrays.java:1506)
    at java.util.ArrayList.sort(ArrayList.java:1454)
    at java.util.Collections.sort(Collections.java:141)
    at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:999)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:83)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:194)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.endPreviousStates(CombineGVCFs.java:365)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:253)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:115)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:291)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:280)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
snp • 981 views
ADD COMMENTlink modified 2.5 years ago by Pierre Lindenbaum130k • written 2.5 years ago by GK161080
1

You should minimise the mixing of GATK with tools from the 'other' main germline variant calling pipeline, i.e., BWA, SAMtools, and BCFtools. They are incompatible in certain areas.

A good pipeline that will give you what you want:

  1. Alignment (bwa)
  2. Remove duplicates (Picard)
  3. Local re-alignment (GATK IndelRealigner, etc)
  4. BQSR (GATK)
  5. Variant calling per file (GATK HaplotypeCaller)
  6. Produce gVCF (GATK GenotypeGVCF)
  7. Separate out SNVs (remove indels (GATK SelectVariants)
ADD REPLYlink written 2.5 years ago by Kevin Blighe65k
1
gravatar for aays
2.5 years ago by
aays140
Canada
aays140 wrote:

On the surface, this very much seems like a bug - I'm not too familiar with the inner workings of GATK, but the final bit of that error message is telling me that some kind of object coercion is going on when it shouldn't be. However, this may also have to do with the fact that (iirc) CombineGVCFs can only accept gVCFs generated from a GATK tool like HaplotypeCaller, and so the removal of indels with a non-GATK tool might be causing this error.

From the GATK forums:

The underlying problem is a lack of consensus in the field as to what constitutes a gvcf. By the standards of samtools and some Illumina-derived files we've seen, a gvcf is essentially an all-sites VCF, potentially with non-variant block records. Within the scope of GATK, gvcf means much more than that -- it includes a symbolic non-ref allele and metrics that relate to the confidence level of both reference and non-ref calls. As a result, the GATK tools that process gvcfs are only able to handle GATK-derived gvcfs, because those are the only ones that fulfill the expectations regarding that extra information.

ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by aays140

Yes I get that. My problem is I want to remove indels from my gvcf files before I start combinGVCFs to merge GVCF files.

I am using this command. java -Xmx2g -jar $GenomeAnalysisTK_jar SelectVariants -R $REF -V $file -O output.vcf.gz --select-type-to-exclude INDEL

Unfortunately it does nothing. when I grep

zcat output.vcf.gz | grep AAAAAACAAC

chr22 31470782 . AAAAAACAAC A,<non_ref> 0.60 . BaseQRankSum=-0.736;ClippingRankSum=0.736;DP=3;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.736;RAW_MQ=10800.00;ReadPosRankSum=0.736 GT:AD:DP:GQ:PL:SB 0/1:2,1,0:3:29:29,0,46,35,49,84:1,1,0,1

This shouldn't happen!!!

bcftools removes indels nicely which is why I had to detour from GATK to bcftools. I don't know what s going on.

ADD REPLYlink written 2.5 years ago by GK161080

What about combining gvcfs first, and then filtering indels from the obtained vcf?

ADD REPLYlink written 2.5 years ago by WouterDeCoster44k

you are right I have like ~240 samples I combined 20 samples first and created 12 meta-merged files for each chromosome

and when I remerge these meta-merge files. I get this error of java.lang.IllegalArgumentException: Unexpected base in allele bases '*AAAAAAAAC'

https://gatkforums.broadinstitute.org/gatk/discussion/11693/combine-gvcfs-error#latest

ADD REPLYlink written 2.5 years ago by GK161080
1
gravatar for WouterDeCoster
2.5 years ago by
Belgium
WouterDeCoster44k wrote:

removed indels using bcftools in 2 gvcf files

Well there is your problem. Why would you do that? I don't think there is a good enough reason to change gvcf files.

ADD COMMENTlink written 2.5 years ago by WouterDeCoster44k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1621 users visited in the last hour