Emule "Samtools View" Using Pysam
2
3
Entering edit mode
10.5 years ago

Hi!!

I want to read and process BAM files using my own python scripts. Pysam is a python library designed to do this, but I don't understand how to use it.

For example, I would like get a human-readable print of a BAM file using pysam. The same output I get when I use the following command:

samtools view file.bam


I tried to do this using this code:

import sys
import pysam

def main(BAM):

samfile = pysam.Samfile( BAM, "rb" )

for row in samfile:
print row

if __name__ == '__main__':
main(sys.argv[1])


But I get an output that is similar than a SAM file, but doesn't have the CIGAR line and have strange fields that I don't understand:

ERR030860.11710_HWI-BRUNOP16X_0001:5:1:10517:2607#0    0    23    43534648    0    [(0, 89), (3, 285), (0, 11)]    -1    -1    100    ATGAAGCAATTGCTGAATTGGATACGCTGAATGAAGAGTCTTATAAAGACAGCACTCTGATCCTGCAGTTACTTAGGGACAATCTCCCTCTGTGGACATC    5565555555@GGAGGGGGC5=><554455DDD.D>?A@?A<?>>444444452*4344555*55DFDDDFDDD<55445C??=8,+14444444<A?@@    [('NM', 2), ('IH', 1), ('HI', 1), ('XS', '+')]
ERR030860.11718_HWI-BRUNOP16X_0001:5:1:11798:2613#0    0    8    30267395    102    [(0, 91)]    -1    -1    91    CAAAGGAAGCTCTTGGTATACGGAAAAAGTATTCAACAATAAATTAGGCATGGTTGCTTCCATTTTCTGCCTCACATACTTTTTTTTTTCG    GHFGHHHGHHHHHHEHHHHDDFBDD44445HHHHHHHHHHGHHHHDC=GGHHHHHHHHHHHEH=HHHHHHHHG>HHHHEHHHHHHCCCBC@    [('NM', 3), ('IH', 1), ('HI', 1)]
ERR030860.11720_HWI-BRUNOP16X_0001:5:1:11974:2612#0    0    27    133472526    0    [(0, 21), (3, 791), (0, 38)]    -1    -1    59    AGTTCTATGGGTCAAAAGAGGATCCACAGACTTTCTATTATGCTGTTGCTGTGGTGACG    HHHHHHHGHH<AD@D444(4445553444/45544EEGGGHHGHHHHHHDGGGGGD:,D    [('NM', 1), ('IH', 1), ('HI', 1), ('XS', '+')]
ERR030860.11722_HWI-BRUNOP16X_0001:5:1:12095:2607#0    0    9    10608405    0    [(0, 95)]    -1    -1    95    GCCATTGAAGACTGGAATAATGAAAGCAGCATGCCCTGTTGTGTCCTTCAGCTTGGCGATATCATCGATGGATATAATGCACAGTATAATGCATC    HHHHHHHHHHHHHHHBD?DD45556@@A?AEEEHHA?DDA4446/BDDADH=HHH>+=<:/4655F?BAFC/5A@4345*1424531*+2@D<5?    [('NM', 1), ('IH', 1), ('HI', 1)]
ERR030860.11724_HWI-BRUNOP16X_0001:5:1:12418:2610#0    0    2    4114112    0    [(0, 64)]    -1    -1    64    TTTGGGGAAGGGGTGGCTAAGGAAGATGGTATAGGTGAAGGCGGCTGTGTGACCACTTTACCCC    GGHHHHGHHHHHHHHBCGHH55555425515444514455C4CCCCE?E244553A=@>09?=A    [('NM', 1), ('IH', 1), ('HI', 1)]
ERR030860.11725_HWI-BRUNOP16X_0001:5:1:13271:2613#0    0    27    9482322    0    [(0, 60), (3, 894), (0, 27)]    -1    -1    87    GACACTATTAATAAGACTGAATTGGCCTGTAATAACACAGTTATTGGTTCCCAAATGCAGTTACAGCTGGGACGAGTCACTCGTGTT    HBHHHHHHHFHHHHHHHGHHEHHHHEHHHEF@BFFFHHEHHGHHHHH:HHHHHHEHHHHHHHHHHHEHHHB@*D743150414536-    [('NM', 1), ('IH', 1), ('HI', 1), ('XS', '+')]


Can anybody tell me how to get a SAM output using pysam??

I'm also interested in other ways to read BAM files using python scripts.

sam bam python • 9.8k views
1
Entering edit mode
10.5 years ago
Andreas ★ 2.5k

Hi,

reads stored in a BAM file are processed by Pysam and returned as pysam.AlignedReads. Their string representation (which you printed) is not the same as the original line in the SAM file. See 'class pysam.AlignedRead' in the manual for a full description.

I don't think Pysam keeps a copy of the original line. But you should be able to recreate it, if you really needed. For example, the CIGAR string of a read can be accessed through its cigar property (row.cigar in your case). This will not, however, return the original string but a "a list of tuples (operation,length). For example, the tuple [ (0,3), (1,5), (0,2) ] refers to an alignment with 3 matches, 5 insertions and another 2 matches." (from Pysam glossary). That you would have to process to get back to the CIGAR string.

Andreas

0
Entering edit mode

But when I print row.cigar I get the tuples, not the CIGAR. Well... I at least need to understand the meaning of this tuples... for example I have tuples like this: [(0, 25), (3, 1642), (0, 44)] .... It's means that I have 25 matches, one intron of 1642 and 44 matches ??

0
Entering edit mode

Yes, you don't get the original string, but a processed version, which is what people usually want. If you need a Python script to simply access the original string representation of the aligned reads in the samefile, then it's probably easier to use Python's subprocess module and read the output from samtools view.

0
Entering edit mode

I'm used to directly analyse the SAM lines, but that I understand this processed version, it's even easier manipulate BAM/SAM files. THANKS!!

1
Entering edit mode
8.8 years ago
Fidel ★ 2.0k

You can use samtools view directly within pysam:

import pysam
rows = pysam.view("-S", "file.sam")
for r in rows:
print r