Question: bcftools to GATK
0
gravatar for GK1610
23 months ago by
GK161070
United States
GK161070 wrote:

I removed indels using bcftools in 2 gvcf files. when i try to merge them using combineGVCFs/gatk, i see this error

can anyone help me, how to fix this?

INFO  19:05:26,829 ProgressMeter -  chr22:16051001       1.6E7    30.0 s       1.0 s       53.6%    56.0 s      26.0 s
##### ERROR --
##### ERROR stack trace
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.Integer
    at java.lang.Integer.compareTo(Integer.java:52)
    at java.util.ComparableTimSort.countRunAndMakeAscending(ComparableTimSort.java:320)
    at java.util.ComparableTimSort.sort(ComparableTimSort.java:188)
    at java.util.Arrays.sort(Arrays.java:1312)
    at java.util.Arrays.sort(Arrays.java:1506)
    at java.util.ArrayList.sort(ArrayList.java:1454)
    at java.util.Collections.sort(Collections.java:141)
    at org.broadinstitute.gatk.utils.MathUtils.median(MathUtils.java:999)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.combineAnnotationValues(ReferenceConfidenceVariantContextMerger.java:83)
    at org.broadinstitute.gatk.tools.walkers.variantutils.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:194)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.endPreviousStates(CombineGVCFs.java:365)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:253)
    at org.broadinstitute.gatk.tools.walkers.variantutils.CombineGVCFs.reduce(CombineGVCFs.java:115)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:291)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociReduce.apply(TraverseLociNano.java:280)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
    at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92)
    at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48)
    at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:255)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:157)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.6-0-g89b7209):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to
##### ERROR commonly asked questions https://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: java.lang.Double cannot be cast to java.lang.Integer
snp • 811 views
ADD COMMENTlink modified 23 months ago by Pierre Lindenbaum126k • written 23 months ago by GK161070
1

You should minimise the mixing of GATK with tools from the 'other' main germline variant calling pipeline, i.e., BWA, SAMtools, and BCFtools. They are incompatible in certain areas.

A good pipeline that will give you what you want:

  1. Alignment (bwa)
  2. Remove duplicates (Picard)
  3. Local re-alignment (GATK IndelRealigner, etc)
  4. BQSR (GATK)
  5. Variant calling per file (GATK HaplotypeCaller)
  6. Produce gVCF (GATK GenotypeGVCF)
  7. Separate out SNVs (remove indels (GATK SelectVariants)
ADD REPLYlink written 23 months ago by Kevin Blighe55k
1
gravatar for aays
23 months ago by
aays140
Canada
aays140 wrote:

On the surface, this very much seems like a bug - I'm not too familiar with the inner workings of GATK, but the final bit of that error message is telling me that some kind of object coercion is going on when it shouldn't be. However, this may also have to do with the fact that (iirc) CombineGVCFs can only accept gVCFs generated from a GATK tool like HaplotypeCaller, and so the removal of indels with a non-GATK tool might be causing this error.

From the GATK forums:

The underlying problem is a lack of consensus in the field as to what constitutes a gvcf. By the standards of samtools and some Illumina-derived files we've seen, a gvcf is essentially an all-sites VCF, potentially with non-variant block records. Within the scope of GATK, gvcf means much more than that -- it includes a symbolic non-ref allele and metrics that relate to the confidence level of both reference and non-ref calls. As a result, the GATK tools that process gvcfs are only able to handle GATK-derived gvcfs, because those are the only ones that fulfill the expectations regarding that extra information.

ADD COMMENTlink modified 23 months ago • written 23 months ago by aays140

Yes I get that. My problem is I want to remove indels from my gvcf files before I start combinGVCFs to merge GVCF files.

I am using this command. java -Xmx2g -jar $GenomeAnalysisTK_jar SelectVariants -R $REF -V $file -O output.vcf.gz --select-type-to-exclude INDEL

Unfortunately it does nothing. when I grep

zcat output.vcf.gz | grep AAAAAACAAC

chr22 31470782 . AAAAAACAAC A,<non_ref> 0.60 . BaseQRankSum=-0.736;ClippingRankSum=0.736;DP=3;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-0.736;RAW_MQ=10800.00;ReadPosRankSum=0.736 GT:AD:DP:GQ:PL:SB 0/1:2,1,0:3:29:29,0,46,35,49,84:1,1,0,1

This shouldn't happen!!!

bcftools removes indels nicely which is why I had to detour from GATK to bcftools. I don't know what s going on.

ADD REPLYlink written 23 months ago by GK161070

What about combining gvcfs first, and then filtering indels from the obtained vcf?

ADD REPLYlink written 23 months ago by WouterDeCoster43k

you are right I have like ~240 samples I combined 20 samples first and created 12 meta-merged files for each chromosome

and when I remerge these meta-merge files. I get this error of java.lang.IllegalArgumentException: Unexpected base in allele bases '*AAAAAAAAC'

https://gatkforums.broadinstitute.org/gatk/discussion/11693/combine-gvcfs-error#latest

ADD REPLYlink written 23 months ago by GK161070
1
gravatar for WouterDeCoster
23 months ago by
Belgium
WouterDeCoster43k wrote:

removed indels using bcftools in 2 gvcf files

Well there is your problem. Why would you do that? I don't think there is a good enough reason to change gvcf files.

ADD COMMENTlink written 23 months ago by WouterDeCoster43k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1036 users visited in the last hour