Fetching Read By Its Id From A Bam File With Pysam In Python
1
1
Entering edit mode
13.1 years ago
User 9996 ▴ 840

Is there a way to fetch a read efficiently from a BAM file, using Pysam or a similar module (from Python), by its read ID?

For example, if I have a list of read IDs, "read_ids", I want to do something like:

bam_file = pysam.Samfile(bam_filename, "rb")

for read_id in read_ids:
  # fetch the read id?
  my_aligned_read = bam_file.fetch(read_id)

is there a way to do this? The indexed/sorted BAM format should have all this information I am just wondering how to retrieve it.

thanks.

samtools python next-gen sequencing • 9.9k views
ADD COMMENT
0
Entering edit mode

I am hitting the same issue ..just wondering what solution worked out for you ??

ADD REPLY
4
Entering edit mode
13.1 years ago
brentp 24k

the short answer is "no", not without some programming on your own. The BAM index is by location, not by read-id. You will have to create your own index to access by name.

I had written some code to do this using tokyo cabinet to save an index of sam header to file position (from which you can then read the SAM info). That code is here. (if you dont like the tokyo cabinet dependency, Istvan Albert pointed out that it's just as well to use the bsddb module that comes with python up to at least 2.6.)

I believe you can write your own index using screed as well--by default it supports fasta and fastq formats--but I have not tried that. It uses an sqlite backend.

ADD COMMENT
1
Entering edit mode

What about the -n option of samtools sort?

-n Sort by read names rather than by chromosomal coordinates"

(http://samtools.sourceforge.net/samtools.shtml)

ADD REPLY
0
Entering edit mode

can this code you wrote with tokyocabinet be adapted to BAM files, and ones that have headers? It seems that this code relies on text SAM files (non-binary formats).

ADD REPLY
0
Entering edit mode

no, it requires a text format.

ADD REPLY

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6