pysam header and body
2
0
Entering edit mode
7 weeks ago
Matteo Ungaro ▴ 130

Hi there, I started to work with pysam to get a BAM file into a Jupyther notebook. I mostly followed what indicated in the manual for importing the file:

file_bam = pysam.AlignmentFile("<file>.bam", "rb")

and to then visualize the header with:

print(file_bam.header)

Though, I wonder is there an easy command to print the body of the file as in samtools view with all associated info? I looked up a bit and found this pysam.view("-S", "<file>.bam") but it seems quite slow and couldn't understand the meaning of the -S flag (which I could not find in the manual). Apologies, for the question but I was looking for something fast&snappy like samtools; thanks in advance!

bam pysam view • 693 views
ADD COMMENT
2
Entering edit mode
7 weeks ago

Each AlignedSegment object has a to_string() method that converts a BAM record into the SAM line you see from samtools view so all you need to do is iterate over all the reads in the file and convert them to strings:

for read in file_bam.fetch(until_eof=True):
    print(read.to_string())

Here, until_eof=True specifies that you want to read every read in the file until the end of file, rather than just the reads aligned to a specific region. It is the only sort of fetch that works without an index, and it is the only fetch that will return unmapped reads.

However, this is almost always the wrong thing to do, and is generally only used for debugging.

If you wish to operate on the values in the columns of the body of the BAM file, then use the methods and attributes provided by the AlignedSegment object, such as get_reference_name, reference_start, cigar_tuples, is_reverse or get_tag. See https://pysam.readthedocs.io/en/latest/api.html#pysam.AlignedSegment for all the options.

If you wish to output reads from your code, then create a stream:

out_bam = pysam.AlignmentFile("out.bam", "w", template=bam_file) # output to a file
out_stream = pysam.AlignmentFile("-", "w", template=bam_file) # output to standard out.

and then write to it:

out_bam.write(read)
ADD COMMENT
0
Entering edit mode

@i.sudbery thanks a lot for the exhaustive explanation! I realized that the for causes indeed some issues with processing of the file in PyCharm. I will use the various method of the AlignedSegment object; it's just I wished for a comprehensive view of the file in just one command...

ADD REPLY
0
Entering edit mode
7 weeks ago

Also this option?

file_bam = pysam.AlignmentFile("<file>.bam", "rb")
for aln in file_bam:
    print(aln.to_string())

However, printing the whole bam file is most likely going to flood your screen.


pysam.view("-S", "<file>.bam") couldn't understand the meaning of the -S flag

I think the arguments you pass to pysam.view(...) are the same you would pass to samtools view .... So looking at samtools view help: -S Ignored (input format is auto-detected).

ADD COMMENT

Login before adding your answer.

Traffic: 1840 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6