Question: BWA mem giving malformed SAM headers.
0
gravatar for Ismaelrymy
7 months ago by
Ismaelrymy0
Ismaelrymy0 wrote:

I'm running BWA to align my reads to a big reference. I started using samtools view to obtain directly the BAM file, but when I tried to sort the BAM, it gave me the error:

[E::sam_hdr_sanitise] Malformed SAM header at line 2

So I tried to use Picard CleanSam to try and solve any possible problem with unmapped reads, but it gave me the next error:

[Tue Apr 14 19:04:47 CEST 2020] picard.sam.CleanSam done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=2039414784
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -2083544369
        at java.lang.String.checkBounds(String.java:381)
        at java.lang.String.<init>(String.java:324)
        at htsjdk.samtools.util.StringUtil.bytesToString(StringUtil.java:301)
        at htsjdk.samtools.util.BinaryCodec.readString(BinaryCodec.java:478)
        at htsjdk.samtools.BAMFileReader.readHeader(BAMFileReader.java:701)
        at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:298)
        at htsjdk.samtools.BAMFileReader.<init>(BAMFileReader.java:176)
        at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:396)
        at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:208)
        at picard.sam.CleanSam.doWork(CleanSam.java:78)
        at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:305)
        at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
        at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

So I tried to directly use Samtools sort instead of Samtools view to sort the SAM as it came from BWA, but to no avail, same error:

[E::sam_hdr_sanitise] Malformed SAM header at line 2

I'm getting this error in approximatedly half of the samples. The header of the BAM files looks like this:

    @SQ     SN:abyss_v4.2_1 LN:73
    @SQ     SN:abyss_v4.2_2 LN:86
    @SQ     SN:abyss_v4.2_3 LN:86
    @SQ     SN:abyss_v4.2_4 LN:133
    @SQ     SN:abyss_v4.2_5 LN:154
    @SQ     SN:abyss_v4.2_6 LN:213
    @SQ     SN:abyss_v4.2_7 LN:74
    @SQ     SN:abyss_v4.2_8 LN:71
    @SQ     SN:abyss_v4.2_9 LN:71

And the code used is:

bwa mem -t 12 -M REFERENCEPATH READFILEPATH | samtools sort -@12 -o BAMFILEPATH

Any ideas about what can cause this problem?

Cheers!

alignment software error • 346 views
ADD COMMENTlink modified 7 months ago • written 7 months ago by Ismaelrymy0

What is your samtools version and how much memory is available at the node you use? How many jobs run in parallel?

ADD REPLYlink written 7 months ago by ATpoint41k

Hi there, I realised I was not using the latest Samtools, it was Samtools 1.09 (it was listed as 1.9, that's where my mistake comes from), latest samtools gives me the next error:

[E::bam_hdr_write] Header too long for BAM format

But as I asked in the reply below, are they really that long? Should I trim the names of the reference or just cut the redundant part of the headers in the BAM file?

The node has 128 GB, This process was specified to use 100GB.

ADD REPLYlink written 7 months ago by Ismaelrymy0

Ok, then I suggest you upgrade samtools as many things have improved since that ancient version. Yes, there is a maximum length for the header, see https://github.com/samtools/samtools/issues/1105

It is suspicious though that the error only ocurs with some samples. If it was a header problem then it should happen with every file that uses the same reference, right? Maybe the problem goes away with the most recent version? There were a changes in samtools to better handle long reads, maybe they made changes for the header too, Just try it. 1.09 is ancient, so this can indeed be a problem. Just align a few reads from the problematic files to test it.

ADD REPLYlink modified 7 months ago • written 7 months ago by ATpoint41k
0
gravatar for brunobsouzaa
7 months ago by
brunobsouzaa400
Brazil
brunobsouzaa400 wrote:

What happens if you try:

bwa mem ref.fa read1.fq read2.fq | samtools view -o output.bam

?

If you wanna try a different approach, I like to use:

$BWA mem -M -t $t -R "@RG\tID:$ID\tLB:$LB\tSM:${i}\tPL:$PL" $REF ${i}_R1.fastq.gz ${i}_R2.fastq.gz > $SAM/${i}.sam

$PICARD SortSam I=$SAM/${i}.sam O=$BAM/${i}.bam SO=coordinate CREATE_INDEX=true
ADD COMMENTlink modified 7 months ago • written 7 months ago by brunobsouzaa400
0
gravatar for Ismaelrymy
7 months ago by
Ismaelrymy0
Ismaelrymy0 wrote:

What happens if you try:

bwa mem ref.fa read1.fq read2.fq | samtools view -o output.bam

This is the first thing I tried when, the first error appeared, sorry I didn't specify. So the problem persists, but it appears that I was not using the latest Samtools, now I'm getting a more specific error that says:

[E::bam_hdr_write] Header too long for BAM format

Are they really so long? (you can find them in the first post) Can I just trim the headers from the BAM file or should I trim the reference sequence names and align them again? Sorry if it's a silly question, but it's my first time using references.

ADD COMMENTlink written 7 months ago by Ismaelrymy0

Personally I don´t like using samtools, really prefer Picard for this task. I suggest you update your samtools version or using Picard!

ADD REPLYlink written 7 months ago by brunobsouzaa400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour