Why Bam File Been Seperated By Samtools
2
0
Entering edit mode
7.4 years ago
Chen ★ 1.0k

I run the following command using BWA version 0.7.5:

# transform the SAM file to BAM 
~/bin/samtools view -Sb pe.sam > temp.bam
# sort the samfile
~/bin/samtools sort -f temp.bam pe.bam

but the result I get is

pe.bam.0000.bam
pe.bam.0001.bam
pe.bam.0002.bam
pe.bam.0003.bam
pe.bam.0004.bam
pe.bam.0005.bam

without pe.bam itself, which I expect.

The error message from BWA is :

 [bam_merge_core]  fail to open file pe.bam.0000

But it should open pe.bam.0000.bam, right? Is it a bug of samtools? or what else should I do?

----------Update-----------

I eliminate the -f parameter, and run as ~/bin/samtools sort temp.bam pe , the error disappears.

So I think this is a bug of samtools

bam samtools • 4.3k views
ADD COMMENT
1
Entering edit mode

"The error message from BWA is" : how do you know it's an error from bwa ?

ADD REPLY
0
Entering edit mode

actually the error message starts with "[bam_merge_core]", which I did not paste last time, so I am quite sure

ADD REPLY
0
Entering edit mode

That's from samtools then (bam_merge_core is a samtools internal function). This may be a samtools bug. Which version are you using?

Edit: See my answer below. This is definitely a bug.

ADD REPLY
2
Entering edit mode
7.4 years ago

Upon looking through a bit of code, this seems to be a bug in the most recent (0.1.19) version of samtools. The genesis of this seems to be as follows:

  1. The `out.prefix` parameter used in samtools sort ends up being passed to the bam_sort_core_ext function. There, the output filename is created with sprintf(fnout, "%s%s", prefix, suffix);, unless sending things to stdout. Here, suffix is a pointer to ".bam" and incremented by 4 (i.e., made to point to NULL) when the -f option is used from the command line.

  2. Similarly, the temporary filenames (fns[]) that are to be merged (but not written to!) are generated in the same function using a nearly identical method:

    for(i = 0; i < n_files; ++i) {
       fns[i] = (char*)calloc(strlen(prefix) + 20, 1);
       sprintf(fns[i], "%s.%.4d%s", prefix, i, suffix);
    }

  3. The problem is in the worker threads that perform the actual sorting. There, the temporary filenames are created with sprintf(name, "%s.%.4d.bam", w->prefix, w->index);.

    You can see, then, that the temp files will always end in XXXX.bam, regardless of the fact that the merge function is being told that they should just end with XXXX. The simplest fix would be to just change how fns is set:

    for(i = 0; i < n_files; ++i) {
       fns[i] = (char*)calloc(strlen(prefix) + 20, 1);
       sprintf(fns[i], "%s.%.4d.bam", prefix, i);
    }

The temp files get deleted anyway and the -f option is still honored this way, but things continue to work. I'll double check that this is correct and file a bug report.

Edit: I've confirmed this and filed a bug report. The code fix is very simple.

Edit2: The earlier formatting issues in this answer seem to have been fixed by directly using html rather than relying on the forum's formatting. It's good to know that it can't deal with code blocks inside ordered lists.

Edit3: Apparently this was fixed 6 months ago in the github repository. Grrr, it would have been nice had they just released a new version with bug fixes. I know they're working on a big overhaul to switch to using htslib, but still. I'll update my bug report and see if I can just help get an intermediate bug-fix release pushed out (I don't know how amenable the developers are to that).

ADD COMMENT
2
Entering edit mode

I know it's frustrating to be tripped up by bugs that turn out to be long since fixed. It might have been better if we had been making releases from the 0.1.x branch over the last few months, but with the htslib-based samtools now imminent, we're reluctant to ask people to update to an untested-by-them 0.1.20 and then to immediately update again to a release from the new branch.

ADD REPLY
0
Entering edit mode

Fair enough. This is an easy enough bug to simply avoid anyway :) Good luck getting the htslib version finished. Is the "develop" branch the one being readied for release or is it a different one? I'd be happy to help since I use the samtools code a good bit.

ADD REPLY
1
Entering edit mode
7.4 years ago

The files pe.bam.xxx.bam are the temporary files of samtools sort

your samtools sort crashed for whatever reason.

ADD COMMENT
0
Entering edit mode

See my update of the question, I get an error that should not exist, it try to find a file that does not exit.

ADD REPLY
0
Entering edit mode

Is this a bug of BWA?

ADD REPLY

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6