Question: Why Bam File Been Seperated By Samtools
0
gravatar for Chen
5.3 years ago by
Chen870
Chen870 wrote:

I run the following command using BWA version 0.7.5:

# transform the SAM file to BAM 
~/bin/samtools view -Sb pe.sam > temp.bam
# sort the samfile
~/bin/samtools sort -f temp.bam pe.bam

but the result I get is

pe.bam.0000.bam
pe.bam.0001.bam
pe.bam.0002.bam
pe.bam.0003.bam
pe.bam.0004.bam
pe.bam.0005.bam

without pe.bam itself, which I expect.

The error message from BWA is :

 [bam_merge_core]  fail to open file pe.bam.0000

But it should open pe.bam.0000.bam, right? Is it a bug of samtools? or what else should I do?

----------Update-----------

I eliminate the -f parameter, and run as ~/bin/samtools sort temp.bam pe , the error disappears.

So I think this is a bug of samtools

bam samtools • 3.6k views
ADD COMMENTlink modified 4.9 years ago by Biostar ♦♦ 20 • written 5.3 years ago by Chen870
1

"The error message from BWA is" : how do you know it's an error from bwa ?

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum120k

actually the error message starts with "[bam_merge_core]", which I did not paste last time, so I am quite sure

ADD REPLYlink written 5.3 years ago by Chen870

That's from samtools then (bam_merge_core is a samtools internal function). This may be a samtools bug. Which version are you using?

Edit: See my answer below. This is definitely a bug.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Devon Ryan90k
2
gravatar for Devon Ryan
5.3 years ago by
Devon Ryan90k
Freiburg, Germany
Devon Ryan90k wrote:

Upon looking through a bit of code, this seems to be a bug in the most recent (0.1.19) version of samtools. The genesis of this seems to be as follows:

  1. The `out.prefix` parameter used in samtools sort ends up being passed to the bam_sort_core_ext function. There, the output filename is created with sprintf(fnout, "%s%s", prefix, suffix);, unless sending things to stdout. Here, suffix is a pointer to ".bam" and incremented by 4 (i.e., made to point to NULL) when the -f option is used from the command line.

  2. Similarly, the temporary filenames (fns[]) that are to be merged (but not written to!) are generated in the same function using a nearly identical method:

    for(i = 0; i < n_files; ++i) {
       fns[i] = (char*)calloc(strlen(prefix) + 20, 1);
       sprintf(fns[i], "%s.%.4d%s", prefix, i, suffix);
    }

  3. The problem is in the worker threads that perform the actual sorting. There, the temporary filenames are created with sprintf(name, "%s.%.4d.bam", w->prefix, w->index);.

    You can see, then, that the temp files will always end in XXXX.bam, regardless of the fact that the merge function is being told that they should just end with XXXX. The simplest fix would be to just change how fns is set:

    for(i = 0; i < n_files; ++i) {
       fns[i] = (char*)calloc(strlen(prefix) + 20, 1);
       sprintf(fns[i], "%s.%.4d.bam", prefix, i);
    }

The temp files get deleted anyway and the -f option is still honored this way, but things continue to work. I'll double check that this is correct and file a bug report.

Edit: I've confirmed this and filed a bug report. The code fix is very simple.

Edit2: The earlier formatting issues in this answer seem to have been fixed by directly using html rather than relying on the forum's formatting. It's good to know that it can't deal with code blocks inside ordered lists.

Edit3: Apparently this was fixed 6 months ago in the github repository. Grrr, it would have been nice had they just released a new version with bug fixes. I know they're working on a big overhaul to switch to using htslib, but still. I'll update my bug report and see if I can just help get an intermediate bug-fix release pushed out (I don't know how amenable the developers are to that).

ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Devon Ryan90k
2

I know it's frustrating to be tripped up by bugs that turn out to be long since fixed. It might have been better if we had been making releases from the 0.1.x branch over the last few months, but with the htslib-based samtools now imminent, we're reluctant to ask people to update to an untested-by-them 0.1.20 and then to immediately update again to a release from the new branch.

ADD REPLYlink written 5.3 years ago by John Marshall1.5k

Fair enough. This is an easy enough bug to simply avoid anyway :) Good luck getting the htslib version finished. Is the "develop" branch the one being readied for release or is it a different one? I'd be happy to help since I use the samtools code a good bit.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Devon Ryan90k
1
gravatar for Pierre Lindenbaum
5.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

The files pe.bam.xxx.bam are the temporary files of samtools sort

your samtools sort crashed for whatever reason.

ADD COMMENTlink written 5.3 years ago by Pierre Lindenbaum120k

See my update of the question, I get an error that should not exist, it try to find a file that does not exit.

ADD REPLYlink written 5.3 years ago by Chen870

Is this a bug of BWA?

ADD REPLYlink written 5.3 years ago by Chen870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour