Question: Corrupted bam file
gravatar for dariober
6.0 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

I have a bam file which appears to be corrupted but I can't understand where and why.

The bam and index was generated using this script. It was submitted to LSF via bsub and
it completed successfully:

bwa sampe $ref <(bwa aln $ref $fq1) <(bwa aln $ref $fq2) $fq1 $fq2 \
| samtools view -Su - \
| samtools sort - bam/$bname &&
samtools index bam/${bname}.bam

I used the same script for other files which seem to be ok. Also, I repeated the alignment
in case some corruption occurred on the disks but it made no difference.

The problem or symptom is that the number of alignments on the first chromosome is inconsistent with what reported in the index:

samtools view -c $bam LmjF.01

However the index file reports 96372 alignments (95350 + 1022):

samtools idxstats $bam
LmjF.01    268988    95350    1022
LmjF.02    355712    139731    1441
LmjF.03    384502    105515    1303

I found out something was weird because running picard gave me the follow error:

java -jar ~/bin/CollectMultipleMetrics.jar I=$bam R=$ref O=$outname VALIDATION_STRINGENCY=SILENT

[Wed Aug 20 10:45:07 BST 2014] picard.analysis.CollectMultipleMetrics INPUT=fk041_F5_10_DIP1.bam REFERENCE_SEQUENCE=/lustre/sblab/berald01/reference_data/genomes/leishmania_major/LmjF_v6.1_spike.fa OUTPUT=/lustre/sblab/berald01/projects/20140818_fumi_hmu_pull_down/20140812_miseq/alnSummaryMetrics//fk041_F5_10_DIP1 VALIDATION_STRINGENCY=SILENT    ASSUME_SORTED=true STOP_AFTER=0 PROGRAM=[CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle] VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Aug 20 10:45:07 BST 2014] Executing as on Linux 2.6.18-274.3.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_35-b10; Picard version: 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010) JdkDeflater
WARNING    2014-08-20 10:45:08    SinglePassSamProgram    File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
[Wed Aug 20 10:45:09 BST 2014] picard.analysis.CollectMultipleMetrics done. Elapsed time: 0.04 minutes.
To get help, see
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(
    at picard.metrics.MultiLevelCollector.acceptRecord(
    at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(
    at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(
    at picard.analysis.SinglePassSamProgram.makeItSo(
    at picard.analysis.CollectMultipleMetrics.doWork(
    at picard.cmdline.CommandLineProgram.instanceMain(
    at picard.cmdline.CommandLineProgram.instanceMainWithExit(
    at picard.analysis.CollectMultipleMetrics.main(

EDIT: Version info:

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.18 (r982:295)

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.8-r455

java -jar ~/bin/CollectMultipleMetrics.jar --version




Does anybody know where the issue could be?

bwa samtools bam • 3.6k views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 6.0 years ago by dariober11k

Which version of samtools are you using? I've seen a report of something weird like this on SEQanswers but couldn't reproduce it. If you're using the new htslib-dependent version, did you download directly from github and, if so, did you switch to the specific 1.0 tag releases before compiling htslib and samtools?

ADD REPLYlink written 6.0 years ago by Devon Ryan96k

Thanks for replying Devon. I edited my post to add versions info. No, I'm not using htslib and the 1.x release (in fact, I discovered its existence now only!)

ADD REPLYlink written 6.0 years ago by dariober11k

I wonder if this was a bug that got fixed in a more recent version. 0.1.18 is pretty old, so try either 0.1.19 or the newer 1.0 release.

ADD REPLYlink written 6.0 years ago by Devon Ryan96k

Yes, I'll update to 1.x and see what happens..., thanks again.

ADD REPLYlink written 6.0 years ago by dariober11k
gravatar for dariober
6.0 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

To answer my own question... The problem disappears after upgrading to samtools-1.0

Bad news: It seems samtools-1.0 idxstats index incorrectly reports the number of unmapped reads as reported in this thread on SEQanswers

ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by dariober11k

For those coming to this post and wondering, there's been a bug report filed.

ADD REPLYlink written 6.0 years ago by Devon Ryan96k

Just to note (I started that thread), the problem is not with samtools 1.0 idxstats, but with samtools 1.0 index.  Running samtools 1.0 idxstats on a samtools 0.1.19 index works fine.  Running samtools 0.1.19 idxstats on a samtools 1.0 index shows the error.

ADD REPLYlink written 6.0 years ago by swbarnes28.2k

Thank you for pointing it out. I updated my post. And thanks to @Devon Ryan for reporting the bug.

ADD REPLYlink written 6.0 years ago by dariober11k

Ah, I'd missed that in the original thread. I've updated the bug report.

ADD REPLYlink written 6.0 years ago by Devon Ryan96k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1342 users visited in the last hour