Question

is there any fast and accurate read mapper which provides an easy-to-use API?

0

Entering edit mode

7.4 years ago

Amirosein ▴ 70

Hello,

I am wondering if the most famous read aligners provide easy-to-use APIs.

More specifically, I am developing an algorithm for variant calling and I need to divide input reads and pass each read segment to the aligner and receive its location quickly. for example, instead of saving some read segments in a ".fq" file, and then passing the file into Bowtie or any other aligner, I need the aligner to load itself and stay online so I can pass each read to it and it answers my inquiries for each read immediately. This way I can eliminate many I/Os and some inquiries as well.

My concern and expectations from the aligner:

my queries are for substrings of length ~25 bps.
I am looking for unique matches, and the aligner should exploit it to be very fast.
minimum Ram usage if possible. my algorithm itself uses very low memory and if the aligner uses less than 8GB Ram for Human genome it will be a good candidate.

[Update:] based on my investigations, the best option is to use bwa-aln, it is possible to do the task with it but it is a bit complicated to use it. This is a brief instruction from one of the main contributors:

bwaidx_t *idx = bwa_idx_load(index_prefix, BWA_IDX_ALL);

to read the index. Then read bwa_cal_sa_reg_gap() in bwtaln.c about how to find suffix array (SA) coordinate. Finally, use bwa_sa2pos()in bwase.c to convert SA coordinate to chromosomal coordinate.

I will be glad if anybody can help me with this.

Thanks for your time.

alignment read • 3.0k views

ADD COMMENT • link 6.7 years ago by Amirosein ▴ 70

1

Entering edit mode

Have you considered just taking a pipe as input?

ADD REPLY • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

What do you mean by taking a pipe as input? I want my program to include my own algorithm and a read aligner at the same time working together on every single read.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70

1

Entering edit mode

SparkBWA seems to have an API.

ADD REPLY • link 7.4 years ago by GenoMax 152k

0

Entering edit mode

Thanks a lot. seems good. I'll check it in detail.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70

0

Entering edit mode

I mean a unix pipe, where your program can write the reads into and read them out of the aligner rather than using an API. The only difficulty is that aligners tend to buffer I/O, so you'd need to disable that (presumably by changing the code).

ADD REPLY • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

Aha, good idea, but it will be more useful if we can find a read mapper with an API. Thank you for your idea and consideration.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70

0

Entering edit mode

minimap2 has a Python and C API, but I'm not sure it qualifies for the rest of your requirements. See https://github.com/lh3/minimap2#dguide

ADD REPLY • link 7.4 years ago by WouterDeCoster 48k

0

Entering edit mode

thank you for your reply, but it's not. it is good for long reads of nanopore or pacbio technology. The best one is bwa-aln from the same author but it does not provide easy to use API. you can see the instructions from one of the contributors in my updated post.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70

0

Entering edit mode

Well minimap2 can also be used for short high-quality reads, see https://github.com/lh3/minimap2#short-genomic

ADD REPLY • link 7.4 years ago by WouterDeCoster 48k

0

Entering edit mode

yeah but regardless of if they are the best option, short-reads are reads of length ~100-200 bps. here I want to align substrings of reads of length 25.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70

0

Entering edit mode

So you are not limited to an NGS aligner then? I wonder if you could use BLAT? If you are not going to use the Q scores you could just use sequence.

ADD REPLY • link 7.4 years ago by GenoMax 152k

0

Entering edit mode

Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. mentioned in BLAT documentation.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70

score 0 · Answer 1 · 2018-02-19

0

Entering edit mode

7.4 years ago

Pierre Lindenbaum 166k

bwa has a simple C-based interface, there is a 'example.c' with comments, in the sources:

https://github.com/lh3/bwa/blob/Apache2/example.c

ADD COMMENT • link 7.4 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Yeah, but unfortunately it is not for bwa-aln. In my case, I should use bwa-aln.

ADD REPLY • link 7.4 years ago by Amirosein ▴ 70