Question: Benchmarking Read Alignment And Variant-Calling Algorithms (For Dummies)
6
gravatar for Travis
7.8 years ago by
Travis2.8k
USA
Travis2.8k wrote:

Hi all,

I am wondering if there is a good step by step guide of how to benchmark alignment and variant calling software. I do understand the premise e.g.

Generate reads with known mutations
Align to genome
Assess accuracy
Perform variant calling
Assess accuracy

However I have some kind of intellectual disconnect when I try to think about how to actually do it. Too much time in industry and not enough in academia I suspect!

Can anyone point me in the right direction?

Thanks in advance!

indel alignment algorithm snp • 3.0k views
ADD COMMENTlink written 7.8 years ago by Travis2.8k
1
gravatar for Torst
7.5 years ago by
Torst900
Australia
Torst900 wrote:

M. Ruffalo recently published "Seal" which is an evaluation suite for read aligners.

"With a view to comparing existing short read alignment software, we develop a simulation and evaluation suite, Seal, which simulates NGS runs for different configurations of various factors, including sequencing error, indels and coverage"

Reference:

Ruffalo M, Laframboise T, Koyutürk M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics. 2011 Oct 15;27(20):2790-6. Epub 2011 Aug 19.

http://www.ncbi.nlm.nih.gov/pubmed/21856737

ADD COMMENTlink written 7.5 years ago by Torst900
0
gravatar for Travis
7.8 years ago by
Travis2.8k
USA
Travis2.8k wrote:

I think I have answered the aligner part:

http://www.massgenomics.org/short-read-aligners

ADD COMMENTlink written 7.8 years ago by Travis2.8k
5

This benchmark is flawed. All read mappers easily achieve <1% error rate on simulated data (accurate mappers <0.1%), while the 2nd plot implies something like 10%. There are also a couple of papers benchmarking the mappers, but they all have problems. The best benchmark I have seen is the one done by the 1000g project, but it is not available publicly.

ADD REPLYlink written 7.8 years ago by lh331k

I notice the fake reads were trained on a human sample but used to generate C. elegans reads also. Not sure if this could have affected anything though.

ADD REPLYlink written 7.8 years ago by Travis2.8k

Also BFast shows as a fast aligner??

ADD REPLYlink written 7.8 years ago by Travis2.8k

There is another flaw in your analysis: In essence, we have the “right” answer and can use it to determine if a read is placed correctly. You cannot conclude that an alignment is false (marked in red in your bar graphs), if a read does align in a different location than it was generated from. This in fact tells you nothing unless you prove with Smith-Waterman that the optimal local alignment doesn't pass the alignment criteria in this position. It could in fact be a duplicated region.

ADD REPLYlink written 7.8 years ago by Michael Dondrup46k

Remember that there is actually an authoritative solution which is Smith-Waterman, thus an aligner which uses Smith-Waterman as a last step should in principle yield no false positives. And the flaw of that evaluation is that it wasn't checked. Therefore, the whole analysis is flawed imho, and gives you absolutely nothing, even though it contains some nice ideas.

ADD REPLYlink written 7.8 years ago by Michael Dondrup46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1171 users visited in the last hour