Question: Picard alignment statistics error
gravatar for mia
3.1 years ago by
mia70 wrote:


I am running Picard to get alignment stats on my bam file. However, I am getting the following error

[Fri Mar 04 10:54:28 EST 2016] picard.analysis.CollectAlignmentSummaryMetrics ADAPTER_SEQUENCE=[] REFERENCE_SEQUENCE=/mnt/twin/Rust/puccinia_striiformis_pst_78_1_transcripts.fasta INPUT=/mnt/twin/Rust/SWS484SPF_PBJ_ScaffoldsF500_compare_to_PST_sorted.bam OUTPUT=output.txt    MAX_INSERT_SIZE=100000 EXPECTED_PAIR_ORIENTATIONS=[FR] METRIC_ACCUMULATION_LEVEL=[ALL_READS] IS_BISULFITE_SEQUENCED=false ASSUME_SORTED=true STOP_AFTER=0 VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json
[Fri Mar 04 10:54:28 EST 2016] Executing as surendraa@surendraa-HP-Z640-Workstation on Linux 3.19.0-25-generic amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_74-b02; Picard version: 2.1.0(25ebc07f7fbaa7c1a4a8e6c130c88c1d10681802_1454776546) IntelDeflater
[Fri Mar 04 10:54:28 EST 2016] picard.analysis.CollectAlignmentSummaryMetrics done. Elapsed time: 0.01 minutes.
To get help, see
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(
    at picard.metrics.MultiLevelCollector.acceptRecord(
    at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(
    at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(
    at picard.analysis.SinglePassSamProgram.makeItSo(
    at picard.analysis.SinglePassSamProgram.doWork(
    at picard.cmdline.CommandLineProgram.instanceMain(
    at picard.cmdline.PicardCommandLine.instanceMain(
    at picard.cmdline.PicardCommandLine.main(

The command I am running is

java -jar picard.jar CollectAlignmentSummaryMetrics R="/mnt/twin/Rust/puccinia_striiformis_pst_78_1_transcripts.fasta" I=/mnt/twin/Rust/SWS484SPF_PBJ_ScaffoldsF500_compare_to_PST_sorted.bam O=output.txt

Any help you can provide would be greatly appreciated, Ann

ADD COMMENTlink modified 17 months ago by Dan D6.7k • written 3.1 years ago by mia70

It looks like the quality scores in your BAM are out of the range of what the tool can handle. Can you please paste the output of the following command?

samtools view /mnt/twin/Rust/SWS484SPF_PBJ_ScaffoldsF500_compare_to_PST_sorted.bam | head

I want to take a look at the quality score strings in your BAM.

ADD REPLYlink written 3.1 years ago by Dan D6.7k

I had the same error, using samtools view mysorted.bam | head | cut -f 5

the quality result

60 60 60 60 0 60 60 60 60 39

any clue?

ADD REPLYlink written 17 months ago by Medhat8.2k

Those are the MAPQ scores for the read. I'm suggesting that the ASCII-encoded Phred quality scores are out of range. I'm basing that on the source, specifically the collectQualityData method referenced in the trace.

Would you mind posting the output of your command, minus the cut?

ADD REPLYlink written 17 months ago by Dan D6.7k
1       4       *       0       0       *       *       0       0       AAAAACCCGCCGAAGCGGGTTTTT        *       AS:i:0  XS:i:0  
3       4       *       0       0       *       *       0       0       AAAAATTGCCTGATGCGCTACGCT        *       AS:i:0  XS:i:0  
1455    0       Chromosome      2295796 0       32M     *       0       0       CCAAGCCGGTTGCCTGATGCGACGCTGGCGCG        *       NM:i:0  MD:Z:32 AS:i:32 XS:i:32 XA:Z:Chromosome,+2295570,32M,0;Chromosome,+2295457,32M,1;Chromosome,+2295683,
1457    0       Chromosome      932384  0       32M     *       0       0       CAATATCAGCAGCCGCAACAACCGGTTGCGCC        *       NM:i:0  MD:Z:32 AS:i:32 XS:i:32 XA:Z:Chromosome,+932423,32M,0;  
1459    0       Chromosome      3473808 0       32M     *       0       0       CCCTAACCCTCTCCCCAAAGGGGCGAGGGGAC        *       NM:i:0  MD:Z:32 AS:i:32 XS:i:32 XA:Z:Chromosome,-4042103,32M,0;Chromosome,-621409,32M,0;Chromosome,+3030152,3
ADD REPLYlink modified 17 months ago • written 17 months ago by Medhat8.2k

Hmmm, there are no ASCII-encoded per-base quality scores in the result. I wonder if that's the problem. Here's what a read with those present would look like:


Two questions:

-Does your job fail immediately?

-What aligner are you using?

ADD REPLYlink written 17 months ago by Dan D6.7k

the job fail immediately, aligner bwa-mem, the aligned reads were fasta reads no quality associated with it (actually it is contigs)

ADD REPLYlink written 17 months ago by Medhat8.2k
gravatar for Dan D
17 months ago by
Dan D6.7k
Dan D6.7k wrote:

I think the most likely explanation for the failure is the asterisk in place of the quality scores, which is a result of feeding FASTA data into the upstream alignment process. This is based on @Medhat 's information, the name of the BAM in @mia 's listed command which contains the word "scaffold", and the information in the stack trace combined with examination of the source.

I did some digging around and found some more examples to support this notion. Based on the discussions in the link, it seems like a way to make this tool work with your data is to generate a copy of the BAM using the "PrintReads" tool with the --default-base-qualities parameter. This will insert dummy per-base quality scores and allow the tool to execute. Of course, you shouldn't trust any metric which relates to per-base quality, but everything else should be legit.

ADD COMMENTlink written 17 months ago by Dan D6.7k

based on this answer: java -jar ~/source/GenomeAnalysisTK.jar -T PrintReads -R genome.fa -I contig_sorted_RG.bam -o contig_sorted_rg_fake_quality.bam --defaultBaseQualities 1

and continue to the other step normally and it succeeded.

So I would say this is the correct answer (at least in my case I can not accept it of course)

ADD REPLYlink modified 17 months ago • written 17 months ago by Medhat8.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1273 users visited in the last hour