Question: Is It Possible That Uniquely Mapped Read Was Mapped To The Wrong Place?
3
gravatar for Chai_AF
6.8 years ago by
Chai_AF80
Chai_AF80 wrote:

Is it possible that uniquely mapped read was mapped to the wrong place? How is it possible then given that all base in the read have high quality score?

• 2.8k views
ADD COMMENTlink written 6.8 years ago by Chai_AF80
1

This is a very poorly worded question. What do you mean by wrong place? What do you mean by uniquely mapped read?

Quality score really is just the confidence of the read calls themselves, so if you are referring to errors I think that would be a minor detail. There are many factors which affect where a read is mapped to on a reference.

Please give us more information and a clear sense of what you are actually asking.

ADD REPLYlink written 6.8 years ago by Josh Herr5.6k
12
gravatar for lh3
6.8 years ago by
lh331k
United States
lh331k wrote:

Because:

  1. Nearly all mapping algorithms use heuristics. They cannot guarantee to find the best alignment in terms of the highest Smith-Waterman score.

  2. Even if you could afford SW, the best scored position may still be wrong because the underlying scoring matrix used in SW is inaccurate and the matrix actually changes with regions (e.g. around long homopolymer runs). The sequence evolution may not always follow SW (e.g. in a microsatellite), either.

  3. Even if you knew the precise scoring system, your reference genome may be wrong and in that case nothing you do can fix the problem, unless you fix the reference genome first.

  4. Also as you mentioned, read sequences can be wrong. Even if the base quality is high, there is still a tiny change that a sequencing error may put the read at a wrong place. Given billions of reads, you will have quite a few reads affected by sequencing errors.

ADD COMMENTlink written 6.8 years ago by lh331k
6
gravatar for Obi Griffith
6.8 years ago by
Obi Griffith18k
Washington University, St Louis, USA
Obi Griffith18k wrote:

This is almost a philosophical question. With shorter reads I can imagine a situation where a sequence maps uniquely and perfectly with high score and good quality bases but in fact has a (high quality base) sequence error in it that prevented it from mapping where it "was supposed to". Back in the days of 17bp and 21bp SAGE we actually thought about this a fair bit. But, now with 100 or 150 bp reads, if you have a perfect/unique match you tend to trust it. Having said that, I can still think of hypothetical situations involving homologous genes, variant bases, and errors where strange things could happen. It also depends how you define unique (i.e., unambiguous) mapping. What if a read maps with 100% identity to one place and 90% identity to another? What about 99.0%? What about 99.9%? At what point do you consider two alignments so similar that the read is no longer uniquely mapping? The answer is not simple is not simple and has to take into account the complexity of your sequence library and region it is mapping to, error rates, variations between genome being sequenced and reference genome being aligned to etc.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Obi Griffith18k
3
gravatar for Arun
6.8 years ago by
Arun2.3k
Germany
Arun2.3k wrote:

I generated a test data with biological replicates using flux-simulator to test mapping efficiency of tophat (2.0.0 and 2.0.4) and GSNAP. The data had between 10-15 million reads. I filtered out reads that are not uniquely mapped. About 90-95% of the reads mapped uniquely (with GSNAP > tophat 2.0.4 > tophat 2.0.0) with the differences not being that substantial between GSNAP and tophat 2.0.4. And because its cooked up data, I was able to go back and check if the simulated read location was the same as the mapped location. About 1-1.5% of the reads mapped "wrongly" ( with tophat 2.0.4 having lesser wrong mappings than GSNAP). When I mean wrongly, I mean that there is no EXACT match between the simulated and mapped. They might have same starting position but might have a soft clipping at the 3' end; I counted these to be wrong. To sum up, they are very less, in my opinion.

ADD COMMENTlink modified 6.8 years ago • written 6.8 years ago by Arun2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1457 users visited in the last hour