GATK4 ASEReadCounter returns nothing
2
1
Entering edit mode
4.8 years ago
alfonsosaera ▴ 50

Dear BioStar Users,

I have a problem with GAT4 ASEReadCounter and could not find help googling or in GATK4 forum.

My code:

gatk ASEReadCounter -I sortedByCoord.groups.bam -V mutect.snp.X.chrom.vcf --output-format CSV -O output.csv

and the output

Using GATK jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar ASEReadCounter -I /Synology/002-Projects/320-Adamo_Escape/3207_BAM/sortedByCoord.groups.bam -V /Synology/002-Projects/320-Adamo_Escape/chrom_hist_nopar/S3_1_mutect.snp.X.noPAR.header.vcf -O S3_1.3207_1294.47XXY.Rtable
16:32:59.086 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 18, 2019 4:32:59 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
16:32:59.645 INFO  ASEReadCounter - ------------------------------------------------------------
16:32:59.646 INFO  ASEReadCounter - The Genome Analysis Toolkit (GATK) v4.1.4.1
16:32:59.647 INFO  ASEReadCounter - For support and documentation go to https://software.broadinstitute.org/gatk/
16:32:59.648 INFO  ASEReadCounter - Executing as asaera@sequentia on Linux v4.15.0-72-generic amd64
16:32:59.648 INFO  ASEReadCounter - Java runtime: OpenJDK 64-Bit Server VM v11.0.5+10-post-Ubuntu-0ubuntu1.118.04
16:32:59.648 INFO  ASEReadCounter - Start Date/Time: 18 de diciembre de 2019, 16:32:58 UTC
16:32:59.649 INFO  ASEReadCounter - ------------------------------------------------------------
16:32:59.649 INFO  ASEReadCounter - ------------------------------------------------------------
16:32:59.652 INFO  ASEReadCounter - HTSJDK Version: 2.21.0
16:32:59.652 INFO  ASEReadCounter - Picard Version: 2.21.2
16:32:59.652 INFO  ASEReadCounter - HTSJDK Defaults.COMPRESSION_LEVEL : 2
16:32:59.653 INFO  ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
16:32:59.653 INFO  ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
16:32:59.654 INFO  ASEReadCounter - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
16:32:59.654 INFO  ASEReadCounter - Deflater: IntelDeflater
16:32:59.654 INFO  ASEReadCounter - Inflater: IntelInflater
16:32:59.654 INFO  ASEReadCounter - GCS max retries/reopens: 20
16:32:59.654 INFO  ASEReadCounter - Requester pays: disabled
16:32:59.655 INFO  ASEReadCounter - Initializing engine
16:33:00.299 INFO  FeatureManager - Using codec VCFCodec to read file file:///Synology/002-Projects/320-Adamo_Escape/chrom_hist_nopar/S3_1_mutect.snp.X.noPAR.header.vcf
16:33:00.379 INFO  ASEReadCounter - Done initializing engine
16:33:00.445 INFO  ProgressMeter - Starting traversal
16:33:00.446 INFO  ProgressMeter -        Current Locus  Elapsed Minutes        Loci Processed      Loci/Minute
16:33:00.534 INFO  ASEReadCounter - Shutting down engine
[18 de diciembre de 2019, 16:33:00 UTC] org.broadinstitute.hellbender.tools.walkers.rnaseq.ASEReadCounter done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=2155872256
java.lang.ArrayIndexOutOfBoundsException: Index 0 out of bounds for length 0
    at org.broadinstitute.hellbender.engine.ReferenceContext.getBase(ReferenceContext.java:406)
    at org.broadinstitute.hellbender.tools.walkers.rnaseq.ASEReadCounter.apply(ASEReadCounter.java:183)
    at org.broadinstitute.hellbender.engine.LocusWalker.lambda$traverse$0(LocusWalker.java:180)
    at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
    at org.broadinstitute.hellbender.engine.LocusWalker.traverse(LocusWalker.java:178)
    at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1048)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:139)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:191)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:210)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:163)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:206)
    at org.broadinstitute.hellbender.Main.main(Main.java:292)

There are no errors, as I've seen in other posts, and the java exceptions are not very informative (at least for me). The csv only contains the header, meaning no results.

Following gatk Got a problem? (grey box on the left,) suggestions I did:

1.- Checking of the vcf with gatk ValidateVariants as recommended here, but seems fine.

gatk ValidateVariants -R Homo_sapiens.GRCh37.dna.primary_assembly.fa -V mutect.snp.X.chrom.vcf
Using GATK jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar ValidateVariants -R Homo_sapiens.GRCh37.dna.primary_assembly.fa -V mutect.snp.X.chrom.vcf
17:50:22.728 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/gatk-4.1.4.1/gatk-package-4.1.4.1-local.jar!/com/intel/gkl/native/libgkl_compression.so
Dec 18, 2019 5:50:23 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
17:50:23.105 INFO  ValidateVariants - ------------------------------------------------------------
17:50:23.106 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.4.1
17:50:23.106 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
17:50:23.106 INFO  ValidateVariants - Executing as asaera@sequentia on Linux v4.15.0-72-generic amd64
17:50:23.107 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v11.0.5+10-post-Ubuntu-0ubuntu1.118.04
17:50:23.107 INFO  ValidateVariants - Start Date/Time: 18 de diciembre de 2019, 17:50:22 UTC
17:50:23.107 INFO  ValidateVariants - ------------------------------------------------------------
17:50:23.107 INFO  ValidateVariants - ------------------------------------------------------------
17:50:23.108 INFO  ValidateVariants - HTSJDK Version: 2.21.0
17:50:23.109 INFO  ValidateVariants - Picard Version: 2.21.2
17:50:23.109 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
17:50:23.109 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:50:23.109 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:50:23.110 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:50:23.110 INFO  ValidateVariants - Deflater: IntelDeflater
17:50:23.110 INFO  ValidateVariants - Inflater: IntelInflater
17:50:23.110 INFO  ValidateVariants - GCS max retries/reopens: 20
17:50:23.110 INFO  ValidateVariants - Requester pays: disabled
17:50:23.111 INFO  ValidateVariants - Initializing engine
17:50:23.702 INFO  FeatureManager - Using codec VCFCodec to read file file://mutect.snp.X.chrom.vcf
17:50:23.765 INFO  ValidateVariants - Done initializing engine
17:50:23.770 WARN  ValidateVariants - IDS validation cannot be done because no DBSNP file was provided
17:50:23.770 WARN  ValidateVariants - Other possible validations will still be performed
17:50:23.770 INFO  ProgressMeter - Starting traversal
17:50:23.770 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
17:50:26.430 INFO  ProgressMeter -          X:152780582              0.0                 25690         581880.0
17:50:26.431 INFO  ProgressMeter - Traversal complete. Processed 25690 total variants in 0.0 minutes.
17:50:26.431 INFO  ValidateVariants - Shutting down engine
[18 de diciembre de 2019, 17:50:26 UTC] org.broadinstitute.hellbender.tools.walkers.variantutils.ValidateVariants done. Elapsed time: 0.06 minutes.
Runtime.totalMemory()=3183476736

2.- Checking of the BAM file with picard.jar ValidateSamFile following this and this, but again seems fine.

java -jar /software/picard.jar ValidateSamFile I=sortedByCoord.groups.bam MODE=SUMMARY
INFO    2019-12-18 17:48:13 ValidateSamFile 

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    ValidateSamFile -I sortedByCoord.groups.bam -MODE SUMMARY
**********


17:48:14.441 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/software/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Wed Dec 18 17:48:14 UTC 2019] ValidateSamFile INPUT=sortedByCoord.groups.bam MODE=SUMMARY    MAX_OUTPUT=100 IGNORE_WARNINGS=false VALIDATE_INDEX=true INDEX_VALIDATION_STRINGENCY=EXHAUSTIVE IS_BISULFITE_SEQUENCED=false MAX_OPEN_TEMP_FILES=8000 SKIP_MATE_VALIDATION=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Wed Dec 18 17:48:14 UTC 2019] Executing as asaera@sequentia on Linux 4.15.0-72-generic amd64; OpenJDK 64-Bit Server VM 11.0.5+10-post-Ubuntu-0ubuntu1.118.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.21.1-SNAPSHOT
WARNING 2019-12-18 17:48:14 ValidateSamFile NM validation cannot be performed without the reference. All other validations will still occur.
INFO    2019-12-18 17:49:51 SamFileValidator    Validated Read    10.000.000 records.  Elapsed time: 00:01:36s.  Time for last 10.000.000:   96s.  Last read position: 10:33.208.922
INFO    2019-12-18 17:51:46 SamFileValidator    Validated Read    20.000.000 records.  Elapsed time: 00:03:31s.  Time for last 10.000.000:  115s.  Last read position: 12:49.578.991
INFO    2019-12-18 17:53:53 SamFileValidator    Validated Read    30.000.000 records.  Elapsed time: 00:05:38s.  Time for last 10.000.000:  126s.  Last read position: 15:69.075.317
INFO    2019-12-18 17:56:06 SamFileValidator    Validated Read    40.000.000 records.  Elapsed time: 00:07:51s.  Time for last 10.000.000:  133s.  Last read position: 17:61.670.652
INFO    2019-12-18 17:58:44 SamFileValidator    Validated Read    50.000.000 records.  Elapsed time: 00:10:29s.  Time for last 10.000.000:  157s.  Last read position: 2:47.387.777
INFO    2019-12-18 18:01:17 SamFileValidator    Validated Read    60.000.000 records.  Elapsed time: 00:13:02s.  Time for last 10.000.000:  153s.  Last read position: 22:45.598.970
INFO    2019-12-18 18:04:07 SamFileValidator    Validated Read    70.000.000 records.  Elapsed time: 00:15:53s.  Time for last 10.000.000:  170s.  Last read position: 5:137.848.541
INFO    2019-12-18 18:07:08 SamFileValidator    Validated Read    80.000.000 records.  Elapsed time: 00:18:53s.  Time for last 10.000.000:  180s.  Last read position: 7:56.124.013
INFO    2019-12-18 18:10:00 SamFileValidator    Validated Read    90.000.000 records.  Elapsed time: 00:21:45s.  Time for last 10.000.000:  171s.  Last read position: MT:4.489


## HISTOGRAM    java.lang.String
Error Type  Count
ERROR:MATE_NOT_FOUND    33070
WARNING:MISSING_TAG_NM  98426757

[Wed Dec 18 18:12:35 UTC 2019] picard.sam.ValidateSamFile done. Elapsed time: 24,36 minutes.
Runtime.totalMemory()=3170893824
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp

I tried to post on GATK forum but could not do it since I am a new user and new user's posts are not allowed for some issue with SPAM.

Any ideas? your help will be highly appreciated.

Thanks

GATK4 RNA-Seq • 3.8k views
ADD COMMENT
4
Entering edit mode
4.8 years ago
alfonsosaera ▴ 50

I made it work. Here is my solution just in case someone else has a similar problem.

I made 2 modifications to my code:

  1. Compress the VCF file with bgzip and then index it with tabix
  2. Add a reference genome although the help of ASEReadCounter says the reference (-R) is not mandatory.

My final code:

gatk ASEReadCounter -I sortedByCoord.groups.bam \
                    -V mutect.snp.X.chrom.vcf.gz \
                    -R reference.genome.fa \
                    --output-format CSV \
                    -O output.csv

Hope this helps.

Thank you all for reading and commenting.

ADD COMMENT
2
Entering edit mode

This worked for me, too! Many thanks!

A few additional notes to save future users some trial and error:

  1. the reference .fa must be unzipped
  2. the reference must be fully indexed, including both a .dict and a .fai file (see https://gatkforums.broadinstitute.org/gatk/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference for details on how to obtain both of these)
ADD REPLY
0
Entering edit mode

To save a few more minutes, just in case the link provided in the comment does not load: To make a Fasta dict file one can use this GATK sub-command: gatk CreateSequenceDictionary -R reference.genome.fa. I found this infor from this GATK documentation.

ADD REPLY
2
Entering edit mode

I meet the same preoblem , and I finally made it work by AddOrReplaceReadGroups to my bam file.
use this GATK sub-command: gatk AddOrReplaceReadGroups -I ./KC.bam -O addedgroup_KC.bam -SORT_ORDER coordinate -RGID 4 -RGLB DNA -RGPL ILLUMINA -RGPU RNAseq -RGSM KC

ADD REPLY
1
1
Entering edit mode

Thanks @Pierre Lindenbaum but my VCF has no overlapping variants. I already found the solution that I am adding as answer just in case someone else gets here with similar problem

ADD REPLY

Login before adding your answer.

Traffic: 2263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6