Question

To screen for a good aligner

0

Entering edit mode

6.7 years ago

deepti1rao ▴ 50

I want to screen some aligners to pick the most sensitive one.

What are the good aligners that work with Illumina hiseq paired end data? My genome is about 390 MB.I have 100 crore reads (paired end 502 crores). Is it a good idea to pick up about 2100 reads and first do a BLAST search to identify their position, and then align those with each aligner?? I know that 200 reads are very few to assess the performance of an aligner. But, since I also have to manually check each of them by BLAST, I'm thinking of the minimum.

Also, looking for leads to shortlist some good aligners.

blast Aligner • 1.6k views

ADD COMMENT • link updated 6.7 years ago by Matteo Schiavinato ★ 3.6k • written 6.7 years ago by deepti1rao ▴ 50

0

Entering edit mode

Which kind of sequencing data are you working? I am wondering why you want to do this

ADD REPLY • link 6.7 years ago by IP ▴ 760

0

Entering edit mode

I have Illumina reads with me, from a plant, which is a hybrid of two distantly related varieties. References of the parents are available. I would need a sensitive aligner to tell me for sure that a read belongs to the maternal parent or the paternal one.

ADD REPLY • link 6.7 years ago by deepti1rao ▴ 50

1

Entering edit mode

I would need a sensitive aligner to tell me for sure that a read belongs to the maternal parent or the paternal one.

That is a tall order for any aligner. How good are you references?

You may want to look at BBsplit from BBMap to bin/split the reads (BBSplit syntax for generating builds for the reference genome and how to call different builds. )

ADD REPLY • link 6.7 years ago by GenoMax 141k

0

Entering edit mode

100 crore = 1 Billion reads. Why is that equating to 502 crore PE reads?

ADD REPLY • link 6.7 years ago by GenoMax 141k

score 3 · Answer 1 · 2017-08-09

3

Entering edit mode

6.7 years ago

ATpoint 82k

There are plenty of reports out that compare the commonly used NGS aligners. Spend some quality time on reading them; can all be found on PubMed. And please do not start to do any self-made BLAST-based comparisons in alignment accuracy. Read the literature first.

ADD COMMENT • link 6.7 years ago by ATpoint 82k

0

Entering edit mode

I agree with you! I've done a bit of literature searching, but I should do more of it. It really looks impossible to set up any sort of an experiment, which would actually make sense. Thanks for responding :)

ADD REPLY • link 6.7 years ago by deepti1rao ▴ 50

score 0 · Answer 2 · 2017-08-09

With most of the aligners you can obtain what you need - if you set parameters properly. There are some questions you have to ask yourself:

Is speed important?
How related are the parental genomes?

Depending on those, you can set up your experiment. If you use, for example, bowtie2, or their new software HISAT2, you have many options to tweak to map reads with up to a particular alignment score and its very fast, it just requires some adjusting. If you use BLAT, you can set a minimum accepted sequence identity to score a match, which could be very high if you don't want your reads to map on the other parental genome. If you use GMAP, you can also choose among many different output formats that can ease out your further analyses. You can also try BWA for example, it's just a matter of choice after all.

For your type of experiment, I would focus the benchmarking more on the parameter set than on the aligner itself. As long as you can choose thresholds for alignment scores / sequence identity / max-min mismatches and gaps / seed mismatches / max num of hits you're fine.