Question: To screen for a good aligner
0
gravatar for deepti1rao
13 months ago by
deepti1rao0
deepti1rao0 wrote:

I want to screen some aligners to pick the most sensitive one.

What are the good aligners that work with Illumina hiseq paired end data? My genome is about 390 MB.I have 100 crore reads (paired end 502 crores). Is it a good idea to pick up about 2100 reads and first do a BLAST search to identify their position, and then align those with each aligner?? I know that 200 reads are very few to assess the performance of an aligner. But, since I also have to manually check each of them by BLAST, I'm thinking of the minimum.

Also, looking for leads to shortlist some good aligners.

blast aligner • 572 views
ADD COMMENTlink modified 13 months ago by Macspider2.5k • written 13 months ago by deepti1rao0
3

There are plenty of reports out that compare the commonly used NGS aligners. Spend some quality time on reading them; can all be found on PubMed. And please do not start to do any self-made BLAST-based comparisons in alignment accuracy. Read the literature first.

ADD REPLYlink modified 13 months ago • written 13 months ago by ATpoint7.4k

I agree with you! I've done a bit of literature searching, but I should do more of it. It really looks impossible to set up any sort of an experiment, which would actually make sense. Thanks for responding :)

ADD REPLYlink written 13 months ago by deepti1rao0

Which kind of sequencing data are you working? I am wondering why you want to do this

ADD REPLYlink written 13 months ago by IP370

I have Illumina reads with me, from a plant, which is a hybrid of two distantly related varieties. References of the parents are available. I would need a sensitive aligner to tell me for sure that a read belongs to the maternal parent or the paternal one.

ADD REPLYlink written 13 months ago by deepti1rao0
1

I would need a sensitive aligner to tell me for sure that a read belongs to the maternal parent or the paternal one.

That is a tall order for any aligner. How good are you references?

You may want to look at BBsplit from BBMap to bin/split the reads (BBSplit syntax for generating builds for the reference genome and how to call different builds. )

ADD REPLYlink written 13 months ago by genomax55k

100 crore = 1 Billion reads. Why is that equating to 502 crore PE reads?

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax55k
0
gravatar for Macspider
13 months ago by
Macspider2.5k
Vienna - BOKU
Macspider2.5k wrote:

With most of the aligners you can obtain what you need - if you set parameters properly. There are some questions you have to ask yourself:

  1. Is speed important?
  2. How related are the parental genomes?

Depending on those, you can set up your experiment. If you use, for example, bowtie2, or their new software HISAT2, you have many options to tweak to map reads with up to a particular alignment score and its very fast, it just requires some adjusting. If you use BLAT, you can set a minimum accepted sequence identity to score a match, which could be very high if you don't want your reads to map on the other parental genome. If you use GMAP, you can also choose among many different output formats that can ease out your further analyses. You can also try BWA for example, it's just a matter of choice after all.

For your type of experiment, I would focus the benchmarking more on the parameter set than on the aligner itself. As long as you can choose thresholds for alignment scores / sequence identity / max-min mismatches and gaps / seed mismatches / max num of hits you're fine.

ADD COMMENTlink written 13 months ago by Macspider2.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1439 users visited in the last hour