Filtering Stream Bam Output Idealy From Pysam/Python
2
1
Entering edit mode
12.4 years ago
User 9996 ▴ 840

I'm making a call to a program that outputs a BAM and I want to filter it on a field.

I use Popen() to call the program and then I want to only keep reads that have a certain value for one of the optional SAM/BAM fields. How can this be done from within samtools? Ideally, I'd like to do it from Pysam but it looks like Pysam does not support streams... e.g.

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout

Pysam can read from standard input, using:

s = pysam.Samfile("-", "rb")

But I want to pipe the output in "p" into Pysam so that I can iterate through and get AlignedRead objects, filtering them as they come. Is there a way to do this?

I tried tricking samfile into thinking the output of "my_cmd" came from stdin, as in:

p = subprocess.Popen(my_cmd, stdin=subprocess.PIPE, shell=True).stdout

But this does not work... if I have the above, and try:

s = pysam.Samfile("-", "rb")
for read in s:
    print read

then I get binary garbage printed to the screen, so pysam is not parsing it for some reason. How can this be fixed?

If not then pysam will force me to write the output to a file first, which is quite prohibitive given the size of BAM files.

thanks.

samtools bam python • 7.8k views
ADD COMMENT
1
Entering edit mode

If you're getting binary stuff printed to the screen, it sounds like the output is still BAM format. In my_cmd, are you passing the -S arg to samtools for SAM output?

ADD REPLY
0
Entering edit mode

try p.stdout instead of "-" in your pysam command.

ADD REPLY
0
Entering edit mode

That will throw 'TypeError: Argument must be string or unicode'

ADD REPLY
4
Entering edit mode
9.0 years ago
Dan D 7.4k

Use a named pipe:

import os
import pysam
import subprocess

os.mkfifo('bampipe')
#let's say we want only mapped reads from a BAM
command = 'samtools view -hb -F 4 test_data.bam > bampipe'
p = subprocess.Popen(command, shell=True)
s = pysam.AlignmentFile('bampipe', 'rb')
ADD COMMENT
2
Entering edit mode
12.4 years ago

Ok, a couple of possible solutions:

  • if pysam can read process.stdout's, use them.

cf. the Python docs: 17.1.4.2. Replacing shell pipeline:

output=`dmesg | grep hda`
# becomes
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

in your case:

p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True)
s = pysam.Samfile(p.stdout, "rb") # maybe p.stdout.read()
  • If "-" in pysam is not implemented in a standard way: you can pipe them together either in (a) the shell or (b) in python by calling your script with a python script command
ADD COMMENT
1
Entering edit mode

This will not work since Samfile() needs the first argument to be a filename, and "-" is just a special holder for stdin. So I don't think I can pass it p.stdout

ADD REPLY
0
Entering edit mode

Thanks, but what would I put in the command call to p2 however? I want STDIN to be read from Python, using the Samfile("-", ...) call, so I don't want to capture STDIN into a Python stream (since Pysam unfortunately cannot use that...)

ADD REPLY
0
Entering edit mode

In your case, you do not have a p2 if you are the pysam API directly and not the script as a command-line tool.

ADD REPLY

Login before adding your answer.

Traffic: 1342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6