Question: Samtools: append output
0
gravatar for bongbang
4.1 years ago by
bongbang70
United States
bongbang70 wrote:

I'm trying to combine outputs from various view queries each with its own criteria into one bam file. From my bash script:

samtools view -b -F 12 -f 50 $bam_file > $bam_inconsistent # Wrong orientation -/-
samtools view -b -F 60 -f 2 $bam_file >> $bam_inconsistent # Wrong orientation +/+
samtools view -b -F 14 $bam_file >> $bam_inconsistent # Wrong insert size

Unfortunately only the first command seems to take effect.How can I fix this? Thank you. 

(BTW, if there's an easier way to accomplish what I want, I'm open to any suggestion, although I would still like to understand why the above doesn't work.) Thank you.
 

Edit: I've found out why this doesn't work. A binary cannot be appended. Looks like I'll have to use sam output and convert back to bam.

 

samtools • 1.4k views
ADD COMMENTlink modified 4.1 years ago by Jorge Amigo11k • written 4.1 years ago by bongbang70
2
gravatar for Jorge Amigo
4.1 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

aside from the proper result you want to obtain, the rationale of the problem is wrong as you are trying to concatenate binary bam files as if they were plain text files. what you should do is to perform the 3 filters independently, each one generating its own bam file, and then perform a final step that would merge those 3 files into a single one:

samtools view -b -F 12 -f 50 $bam_file > $bam_inconsistent_1 # Wrong orientation -/-
samtools view -b -F 60 -f 2 $bam_file > $bam_inconsistent_2 # Wrong orientation +/+
samtools view -b -F 14 $bam_file > $bam_inconsistent_3 # Wrong insert size
samtools merge $bam_inconsistent_merged $bam_inconsistent_1 $bam_inconsistent_2 $bam_inconsistent_3

Edit: I just saw Pierre's comment above describing a oneliner that would do the work which looks much more elegant and efficient than this answer. I would definitely go for it. I'll leave this answer as I think that it highlights the underlying problem, although the best solution would be Pierre's (samtools view -f2 -h 1.bam ; samtools view -F4 1.bam; samtools -f8 1.bam) | samtools view -Sb -o merged.bam -

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Jorge Amigo11k
1
gravatar for Pierre Lindenbaum
4.1 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

option -F and -f use a logical OR, not a logical AND.

So  -F 60  is exclude :     read unmapped OR mate unmapped OR  read reverse strand OR mate reverse strand . With those last two flags, you're going to remove most of your reads even if they're properly mapped.

if you need a fine filtering for your reads you can use my tool : SamJS https://github.com/lindenb/jvarkit/wiki/SamJS

 

 

 

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Pierre Lindenbaum115k

I understand that. I'm actually trying to exclude the properly mapped reads. The last two commands are meant to add things that are not covered by the first, and they do according to the results I got from view -c (12, 35, and 64219 respectively for the three queries). The problem is that they are not adding anything when I try to output to file with the >> syntax. So I got only 12 reads in my file instead of 12+35+64219 expected. The problem here, I believe, is my unix syntax, not samtools. 

ADD REPLYlink written 4.1 years ago by bongbang70
1

ho, I see. you need to use a simple redirection '>' and merge the  new files using samtools merge

samtools merge merged.bam in.1.bam in.2.bam ...

ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum115k

Do you think I can use piping to obviate the need for temporary files?

ADD REPLYlink written 4.1 years ago by bongbang70
1

yes but without using a binary format and the only first BAM needs to print the header (option -h).  Something like:

(samtools view -f2 -h 1.bam ; samtools view -F4 1.bam; samtools -f8 1.bam) | samtools view -Sb -o merged.bam -

 

ADD REPLYlink written 4.1 years ago by Pierre Lindenbaum115k

Many thanks. If you edit your answer to include this, I'll accept it. It'll make it easier for future readers with the same question.

ADD REPLYlink written 4.1 years ago by bongbang70
1

FYI: each BAM file is ended with a special EOF block, so tools won't continue processing file once it's encountered, that's why tools report only 12 reads.

ADD REPLYlink written 4.1 years ago by lomereiter430

Actually correctly-written software should continue over embedded EOF blocks, exactly so that more BAM records could be appended to a BAM file (see §4.1.2 of the SAM spec).  This didn't work for the O.P. because adding more BAM headers in the middle of the file won't work, and because some software has bugs in this admittedly esoteric area (including samtools at present).

ADD REPLYlink written 4.1 years ago by John Marshall1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour