BWA mem giving malformed SAM headers.
2
0
Entering edit mode
4.0 years ago
Ismaelrymy • 0

I'm running BWA to align my reads to a big reference. I started using samtools view to obtain directly the BAM file, but when I tried to sort the BAM, it gave me the error:

[E::sam_hdr_sanitise] Malformed SAM header at line 2

So I tried to use Picard CleanSam to try and solve any possible problem with unmapped reads, but it gave me the next error:

[Tue Apr 14 19:04:47 CEST 2020] picard.sam.CleanSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2039414784
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2083544369
        at java.lang.String.checkBounds(String.java:381)
        at java.lang.String.<init>(String.java:324)
        at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
        at htsjdk.samtools.util.BinaryCodec.readString(BinaryCodec.java:478)
        at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:701)
        at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
        at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
        at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
        at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208)
        at picard.sam.CleanSam.doWork(CleanSam.java:78)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

So I tried to directly use Samtools sort instead of Samtools view to sort the SAM as it came from BWA, but to no avail, same error:

[E::sam_hdr_sanitise] Malformed SAM header at line 2

I'm getting this error in approximatedly half of the samples. The header of the BAM files looks like this:

    @SQ     SN:abyss_v4.2_1 LN:73
    @SQ     SN:abyss_v4.2_2 LN:86
    @SQ     SN:abyss_v4.2_3 LN:86
    @SQ     SN:abyss_v4.2_4 LN:133
    @SQ     SN:abyss_v4.2_5 LN:154
    @SQ     SN:abyss_v4.2_6 LN:213
    @SQ     SN:abyss_v4.2_7 LN:74
    @SQ     SN:abyss_v4.2_8 LN:71
    @SQ     SN:abyss_v4.2_9 LN:71

And the code used is:

bwa mem -t 12 -M REFERENCEPATH READFILEPATH | samtools sort -@12 -o BAMFILEPATH

Any ideas about what can cause this problem?

Cheers!

alignment software error • 3.0k views
ADD COMMENT
0
Entering edit mode

What is your samtools version and how much memory is available at the node you use? How many jobs run in parallel?

ADD REPLY
0
Entering edit mode

Hi there, I realised I was not using the latest Samtools, it was Samtools 1.09 (it was listed as 1.9, that's where my mistake comes from), latest samtools gives me the next error:

[E::bam_hdr_write] Header too long for BAM format

But as I asked in the reply below, are they really that long? Should I trim the names of the reference or just cut the redundant part of the headers in the BAM file?

The node has 128 GB, This process was specified to use 100GB.

ADD REPLY
0
Entering edit mode

Ok, then I suggest you upgrade samtools as many things have improved since that ancient version. Yes, there is a maximum length for the header, see https://github.com/samtools/samtools/issues/1105

It is suspicious though that the error only ocurs with some samples. If it was a header problem then it should happen with every file that uses the same reference, right? Maybe the problem goes away with the most recent version? There were a changes in samtools to better handle long reads, maybe they made changes for the header too, Just try it. 1.09 is ancient, so this can indeed be a problem. Just align a few reads from the problematic files to test it.

ADD REPLY
0
Entering edit mode
4.0 years ago
brunobsouzaa ▴ 830

What happens if you try:

bwa mem ref.fa read1.fq read2.fq | samtools view -o output.bam

?

If you wanna try a different approach, I like to use:

$BWA mem -M -t $t -R "@RG\tID:$ID\tLB:$LB\tSM:${i}\tPL:$PL" $REF ${i}_R1.fastq.gz ${i}_R2.fastq.gz > $SAM/${i}.sam

$PICARD SortSam I=$SAM/${i}.sam O=$BAM/${i}.bam SO=coordinate CREATE_INDEX=true
ADD COMMENT
0
Entering edit mode
4.0 years ago
Ismaelrymy • 0

What happens if you try:

bwa mem ref.fa read1.fq read2.fq | samtools view -o output.bam

This is the first thing I tried when, the first error appeared, sorry I didn't specify. So the problem persists, but it appears that I was not using the latest Samtools, now I'm getting a more specific error that says:

[E::bam_hdr_write] Header too long for BAM format

Are they really so long? (you can find them in the first post) Can I just trim the headers from the BAM file or should I trim the reference sequence names and align them again? Sorry if it's a silly question, but it's my first time using references.

ADD COMMENT
0
Entering edit mode

Personally I donĀ“t like using samtools, really prefer Picard for this task. I suggest you update your samtools version or using Picard!

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6