Question: Corrupted bam file
1
gravatar for dariober
5.1 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

I have a bam file which appears to be corrupted but I can't understand where and why.

The bam and index was generated using this script. It was submitted to LSF via bsub and
it completed successfully:

bwa sampe $ref <(bwa aln $ref $fq1) <(bwa aln $ref $fq2) $fq1 $fq2 \
| samtools view -Su - \
| samtools sort - bam/$bname &&
samtools index bam/${bname}.bam

I used the same script for other files which seem to be ok. Also, I repeated the alignment
in case some corruption occurred on the disks but it made no difference.

The problem or symptom is that the number of alignments on the first chromosome is inconsistent with what reported in the index:

samtools view -c $bam LmjF.01
96370

However the index file reports 96372 alignments (95350 + 1022):

samtools idxstats $bam
LmjF.01    268988    95350    1022
LmjF.02    355712    139731    1441
LmjF.03    384502    105515    1303
...

I found out something was weird because running picard gave me the follow error:

java -jar ~/bin/CollectMultipleMetrics.jar I=$bam R=$ref O=$outname VALIDATION_STRINGENCY=SILENT

[Wed Aug 20 10:45:07 BST 2014] picard.analysis.CollectMultipleMetrics INPUT=fk041_F5_10_DIP1.bam REFERENCE_SEQUENCE=/lustre/sblab/berald01/reference_data/genomes/leishmania_major/LmjF_v6.1_spike.fa OUTPUT=/lustre/sblab/berald01/projects/20140818_fumi_hmu_pull_down/20140812_miseq/alnSummaryMetrics//fk041_F5_10_DIP1 VALIDATION_STRINGENCY=SILENT    ASSUME_SORTED=true STOP_AFTER=0 PROGRAM=[CollectAlignmentSummaryMetrics, CollectInsertSizeMetrics, QualityScoreDistribution, MeanQualityByCycle] VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false
[Wed Aug 20 10:45:07 BST 2014] Executing as berald01@uk-cri-lcst01.crnet.org on Linux 2.6.18-274.3.1.el5 amd64; Java HotSpot(TM) 64-Bit Server VM 1.6.0_35-b10; Picard version: 1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010) JdkDeflater
WARNING    2014-08-20 10:45:08    SinglePassSamProgram    File reports sort order 'unsorted', assuming it's coordinate sorted anyway.
[Wed Aug 20 10:45:09 BST 2014] picard.analysis.CollectMultipleMetrics done. Elapsed time: 0.04 minutes.
Runtime.totalMemory()=252116992
To get help, see http://picard.sourceforge.net/index.shtml#GettingHelp
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.collectQualityData(AlignmentSummaryMetricsCollector.java:331)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector$IndividualAlignmentSummaryMetricsCollector.addRecord(AlignmentSummaryMetricsCollector.java:229)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:155)
    at picard.analysis.AlignmentSummaryMetricsCollector$GroupAlignmentSummaryMetricsPerUnitMetricCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:127)
    at picard.metrics.MultiLevelCollector$AllReadsDistributor.acceptRecord(MultiLevelCollector.java:192)
    at picard.metrics.MultiLevelCollector.acceptRecord(MultiLevelCollector.java:315)
    at picard.analysis.AlignmentSummaryMetricsCollector.acceptRecord(AlignmentSummaryMetricsCollector.java:71)
    at picard.analysis.CollectAlignmentSummaryMetrics.acceptRead(CollectAlignmentSummaryMetrics.java:114)
    at picard.analysis.SinglePassSamProgram.makeItSo(SinglePassSamProgram.java:117)
    at picard.analysis.CollectMultipleMetrics.doWork(CollectMultipleMetrics.java:136)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:183)
    at picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:124)
    at picard.analysis.CollectMultipleMetrics.main(CollectMultipleMetrics.java:100)

EDIT: Version info:

Program: samtools (Tools for alignments in the SAM format)
Version: 0.1.18 (r982:295)

Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.8-r455

java -jar ~/bin/CollectMultipleMetrics.jar --version
1.115(30b1e546cc4dd80c918e151dbfe46b061e63f315_1402927010)

 

 

 

Does anybody know where the issue could be?

bwa samtools bam • 3.1k views
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by dariober10k

Which version of samtools are you using? I've seen a report of something weird like this on SEQanswers but couldn't reproduce it. If you're using the new htslib-dependent version, did you download directly from github and, if so, did you switch to the specific 1.0 tag releases before compiling htslib and samtools?

ADD REPLYlink written 5.1 years ago by Devon Ryan91k

Thanks for replying Devon. I edited my post to add versions info. No, I'm not using htslib and the 1.x release (in fact, I discovered its existence now only!)

ADD REPLYlink written 5.1 years ago by dariober10k
1

I wonder if this was a bug that got fixed in a more recent version. 0.1.18 is pretty old, so try either 0.1.19 or the newer 1.0 release.

ADD REPLYlink written 5.1 years ago by Devon Ryan91k

Yes, I'll update to 1.x and see what happens..., thanks again.

ADD REPLYlink written 5.1 years ago by dariober10k
2
gravatar for dariober
5.1 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

To answer my own question... The problem disappears after upgrading to samtools-1.0

Bad news: It seems samtools-1.0 idxstats index incorrectly reports the number of unmapped reads as reported in this thread on SEQanswers

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by dariober10k
2

For those coming to this post and wondering, there's been a bug report filed.

ADD REPLYlink written 5.1 years ago by Devon Ryan91k
1

Just to note (I started that thread), the problem is not with samtools 1.0 idxstats, but with samtools 1.0 index.  Running samtools 1.0 idxstats on a samtools 0.1.19 index works fine.  Running samtools 0.1.19 idxstats on a samtools 1.0 index shows the error.

ADD REPLYlink written 5.1 years ago by swbarnes26.5k
1

Thank you for pointing it out. I updated my post. And thanks to @Devon Ryan for reporting the bug.

ADD REPLYlink written 5.1 years ago by dariober10k

Ah, I'd missed that in the original thread. I've updated the bug report.

ADD REPLYlink written 5.1 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1211 users visited in the last hour