How to cache reads?
1
0
Entering edit mode
8.3 years ago
everrove • 0

I am using pysam (http://pysam.readthedocs.org/en/latest/api.html#pysam.AlignmentFile.mate) which is an interface to the samtools command line utilities to do analyze sequencing data. I'd like to work with paired read (called mate in pysam). In its tutorial for the command mate it says (http://pysam.readthedocs.org/en/latest/api.html#pysam.AlignmentFile.mate):

This method is too slow for high-throughput processing. If a read needs to be processed with its mate, work from a read name sorted file or, better, cache reads.

How do you 'cache reads'?

Thanks

samtools pysam genome sequence alignment • 2.5k views
ADD COMMENT
3
Entering edit mode
8.3 years ago

Caching is a programming technique. In this context: when you encounter the first read of a pair you do not process it but just keep it in memory, i.e., the "cache". When you later encounter the mate, you process them together.

You have to implement the "cache" which is some sort of data structure (e.g., an array or hash) containing seen-but-not-processed first-reads. Whenever you read in a read you have to check if it is the first-read or second-read of a pair. If it is the first-read, you just add it into your cache data structure. If it is the second-read, then find the corresponding first-read in the cache. After you processed them, you can forget both. Discarding the reads is important because otherwise you will use a large amount of memory. When you use more memory than your computer has, the tool will crash.

ADD COMMENT
0
Entering edit mode

Thanks for your great description. If I understand you correctly, that means I shouldn't be using the original mate method in pysam. Instead I should create my own mate method that just works with the reads stored in RAM. right?

ADD REPLY
0
Entering edit mode

Exactly :)

Furthermore, I think the pysam.mate function is more on looking forward in the file, i.e., given the first-read it searches for the second-read. In contrast, the described caching is looking backwards, i.e., when you encounter the second-read you need to find the first-read.

ADD REPLY

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6