Question: Fetching Read By Its Id From A Bam File With Pysam In Python
1
gravatar for User 9996
8.1 years ago by
User 9996800
User 9996800 wrote:

Is there a way to fetch a read efficiently from a BAM file, using Pysam or a similar module (from Python), by its read ID?

For example, if I have a list of read IDs, "read_ids", I want to do something like:

bam_file = pysam.Samfile(bam_filename, "rb")

for read_id in read_ids:
  # fetch the read id?
  my_aligned_read = bam_file.fetch(read_id)

is there a way to do this? The indexed/sorted BAM format should have all this information I am just wondering how to retrieve it.

thanks.

ADD COMMENTlink written 8.1 years ago by User 9996800

I am hitting the same issue ..just wondering what solution worked out for you ??

ADD REPLYlink written 7.2 years ago by Abhi1.5k
4
gravatar for brentp
8.1 years ago by
brentp23k
Salt Lake City, UT
brentp23k wrote:

the short answer is "no", not without some programming on your own. The BAM index is by location, not by read-id. You will have to create your own index to access by name.

I had written some code to do this using tokyo cabinet to save an index of sam header to file position (from which you can then read the SAM info). That code is here. (if you dont like the tokyo cabinet dependency, Istvan Albert pointed out that it's just as well to use the bsddb module that comes with python up to at least 2.6.)

I believe you can write your own index using screed as well--by default it supports fasta and fastq formats--but I have not tried that. It uses an sqlite backend.

ADD COMMENTlink written 8.1 years ago by brentp23k
1

What about the -n option of samtools sort?

-n Sort by read names rather than by chromosomal coordinates"

(http://samtools.sourceforge.net/samtools.shtml)

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by blaise.li10

can this code you wrote with tokyocabinet be adapted to BAM files, and ones that have headers? It seems that this code relies on text SAM files (non-binary formats).

ADD REPLYlink written 8.1 years ago by User 9996800

no, it requires a text format.

ADD REPLYlink written 8.1 years ago by brentp23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 852 users visited in the last hour