Question: is there any fast and accurate read mapper which provides an easy-to-use API?
0
gravatar for Amirosein
17 months ago by
Amirosein70
UBC, Vancouver, CA.
Amirosein70 wrote:

Hello,

I am wondering if the most famous read aligners provide easy-to-use APIs.

More specifically, I am developing an algorithm for variant calling and I need to divide input reads and pass each read segment to the aligner and receive its location quickly. for example, instead of saving some read segments in a ".fq" file, and then passing the file into Bowtie or any other aligner, I need the aligner to load itself and stay online so I can pass each read to it and it answers my inquiries for each read immediately. This way I can eliminate many I/Os and some inquiries as well.

My concern and expectations from the aligner:

  • my queries are for substrings of length ~25 bps.
  • I am looking for unique matches, and the aligner should exploit it to be very fast.
  • minimum Ram usage if possible. my algorithm itself uses very low memory and if the aligner uses less than 8GB Ram for Human genome it will be a good candidate.

[Update:] based on my investigations, the best option is to use bwa-aln, it is possible to do the task with it but it is a bit complicated to use it. This is a brief instruction from one of the main contributors:

bwaidx_t *idx = bwa_idx_load(index_prefix, BWA_IDX_ALL);

to read the index. Then read bwa_cal_sa_reg_gap() in bwtaln.c about how to find suffix array (SA) coordinate. Finally, use bwa_sa2pos()in bwase.c to convert SA coordinate to chromosomal coordinate.

I will be glad if anybody can help me with this.

Thanks for your time.

alignment read • 669 views
ADD COMMENTlink modified 8 months ago • written 17 months ago by Amirosein70
1

Have you considered just taking a pipe as input?

ADD REPLYlink written 17 months ago by Devon Ryan91k

What do you mean by taking a pipe as input? I want my program to include my own algorithm and a read aligner at the same time working together on every single read.

ADD REPLYlink written 17 months ago by Amirosein70
1

SparkBWA seems to have an API.

ADD REPLYlink written 17 months ago by genomax69k

Thanks a lot. seems good. I'll check it in detail.

ADD REPLYlink written 17 months ago by Amirosein70

I mean a unix pipe, where your program can write the reads into and read them out of the aligner rather than using an API. The only difficulty is that aligners tend to buffer I/O, so you'd need to disable that (presumably by changing the code).

ADD REPLYlink modified 17 months ago • written 17 months ago by Devon Ryan91k

Aha, good idea, but it will be more useful if we can find a read mapper with an API. Thank you for your idea and consideration.

ADD REPLYlink written 17 months ago by Amirosein70

minimap2 has a Python and C API, but I'm not sure it qualifies for the rest of your requirements. See https://github.com/lh3/minimap2#dguide

ADD REPLYlink written 17 months ago by WouterDeCoster40k

thank you for your reply, but it's not. it is good for long reads of nanopore or pacbio technology. The best one is bwa-aln from the same author but it does not provide easy to use API. you can see the instructions from one of the contributors in my updated post.

ADD REPLYlink written 17 months ago by Amirosein70

Well minimap2 can also be used for short high-quality reads, see https://github.com/lh3/minimap2#short-genomic

ADD REPLYlink written 17 months ago by WouterDeCoster40k

yeah but regardless of if they are the best option, short-reads are reads of length ~100-200 bps. here I want to align substrings of reads of length 25.

ADD REPLYlink written 17 months ago by Amirosein70

So you are not limited to an NGS aligner then? I wonder if you could use BLAT? If you are not going to use the Q scores you could just use sequence.

ADD REPLYlink written 17 months ago by genomax69k

Blat of DNA is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. mentioned in BLAT documentation.

ADD REPLYlink written 17 months ago by Amirosein70
0
gravatar for Pierre Lindenbaum
17 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

bwa has a simple C-based interface, there is a 'example.c' with comments, in the sources:

https://github.com/lh3/bwa/blob/Apache2/example.c

ADD COMMENTlink written 17 months ago by Pierre Lindenbaum121k

Yeah, but unfortunately it is not for bwa-aln. In my case, I should use bwa-aln.

ADD REPLYlink written 17 months ago by Amirosein70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 781 users visited in the last hour