Question: Filtering Stream Bam Output Idealy From Pysam/Python
gravatar for User 9996
8.9 years ago by
User 9996810
User 9996810 wrote:

I'm making a call to a program that outputs a BAM and I want to filter it on a field.

I use Popen() to call the program and then I want to only keep reads that have a certain value for one of the optional SAM/BAM fields. How can this be done from within samtools? Ideally, I'd like to do it from Pysam but it looks like Pysam does not support streams... e.g.

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout

Pysam can read from standard input, using:

s = pysam.Samfile("-", "rb")

But I want to pipe the output in "p" into Pysam so that I can iterate through and get AlignedRead objects, filtering them as they come. Is there a way to do this?

I tried tricking samfile into thinking the output of "my_cmd" came from stdin, as in:

p = subprocess.Popen(my_cmd, stdin=subprocess.PIPE, shell=True).stdout

But this does not work... if I have the above, and try:

s = pysam.Samfile("-", "rb")
for read in s:
    print read

then I get binary garbage printed to the screen, so pysam is not parsing it for some reason. How can this be fixed?

If not then pysam will force me to write the output to a file first, which is quite prohibitive given the size of BAM files.


python bam samtools • 5.4k views
ADD COMMENTlink modified 5.5 years ago by Dan D7.1k • written 8.9 years ago by User 9996810

If you're getting binary stuff printed to the screen, it sounds like the output is still BAM format. In my_cmd, are you passing the -S arg to samtools for SAM output?

ADD REPLYlink written 8.8 years ago by Ryan Dale4.9k

try p.stdout instead of "-" in your pysam command.

ADD REPLYlink written 8.9 years ago by Michael Schubert7.0k

That will throw 'TypeError: Argument must be string or unicode'

ADD REPLYlink written 6.3 years ago by Andreas2.5k
gravatar for Dan D
5.5 years ago by
Dan D7.1k
Dan D7.1k wrote:

Use a named pipe:

import os
import pysam
import subprocess

#let's say we want only mapped reads from a BAM
command = 'samtools view -hb -F 4 test_data.bam > bampipe'
p = subprocess.Popen(command, shell=True)
s = pysam.AlignmentFile('bampipe', 'rb')


ADD COMMENTlink modified 5.4 years ago • written 5.5 years ago by Dan D7.1k
gravatar for Michael Schubert
8.9 years ago by
Cambridge, UK
Michael Schubert7.0k wrote:

Ok, a couple of possible solutions:

  • if pysam can read process.stdout's, use them.

cf. the Python docs: Replacing shell pipeline:

output=`dmesg | grep hda`
# becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

in your case:

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True)
s = pysam.Samfile(p.stdout, "rb") # maybe
  • If "-" in pysam is not implemented in a standard way: you can pipe them together either in (a) the shell or (b) in python by calling your script with a python script command
ADD COMMENTlink modified 10 months ago by RamRS30k • written 8.9 years ago by Michael Schubert7.0k

This will not work since Samfile() needs the first argument to be a filename, and "-" is just a special holder for stdin. So I don't think I can pass it p.stdout

ADD REPLYlink written 8.9 years ago by User 9996810

Thanks, but what would I put in the command call to p2 however? I want STDIN to be read from Python, using the Samfile("-", ...) call, so I don't want to capture STDIN into a Python stream (since Pysam unfortunately cannot use that...)

ADD REPLYlink written 8.9 years ago by User 9996810

In your case, you do not have a p2 if you are the pysam API directly and not the script as a command-line tool.

ADD REPLYlink written 8.9 years ago by Michael Schubert7.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour