I am wondering if the most famous read aligners provide easy-to-use APIs.
More specifically, I am developing an algorithm for variant calling and I need to divide input reads and pass each read segment to the aligner and receive its location quickly. for example, instead of saving some read segments in a ".fq" file, and then passing the file into Bowtie or any other aligner, I need the aligner to load itself and stay online so I can pass each read to it and it answers my inquiries for each read immediately. This way I can eliminate many I/Os and some inquiries as well.
My concern and expectations from the aligner:
- my queries are for substrings of length ~25 bps.
- I am looking for unique matches, and the aligner should exploit it to be very fast.
- minimum Ram usage if possible. my algorithm itself uses very low memory and if the aligner uses less than 8GB Ram for Human genome it will be a good candidate.
[Update:] based on my investigations, the best option is to use bwa-aln, it is possible to do the task with it but it is a bit complicated to use it. This is a brief instruction from one of the main contributors:
bwaidx_t *idx = bwa_idx_load(index_prefix, BWA_IDX_ALL);
to read the index. Then read
bwa_cal_sa_reg_gap() in bwtaln.c about how to find suffix array (SA) coordinate. Finally, use
bwa_sa2pos()in bwase.c to convert SA coordinate to chromosomal coordinate.
I will be glad if anybody can help me with this.
Thanks for your time.