Question

Which Short Read Mappers Can Handle Overlapping Paired-End Reads?

1

Entering edit mode

13.2 years ago

Ryan Thompson ★ 3.6k

Suppose I have a paired-end data set from Illumina with 100 base pairs on each end. If any fragment is shorter than 200 base pairs, the ends of the two sequences will overlap when mapped to the genome. For example, if a particular fragment is 150 base pairs long, then the last 50 base pairs of read 1 will be the reverse complement of the last 50 base pairs of read 2.

So, which short-read mapping programs can handle such a case? And for ones that don't, how can I work around this problem?

paired short short aligner • 5.1k views

ADD COMMENT • link updated 10.5 years ago by Adrian Pelin ★ 2.6k • written 13.2 years ago by Ryan Thompson ★ 3.6k

score 3 · Answer 1 · 2011-02-17

3

Entering edit mode

13.2 years ago

lh3 33k

Most of paired-end mappers work (maq, novoalign and bwa for sure). There were overlapping ends three years ago. This is not a new problem at all.

ADD COMMENT • link 13.2 years ago by lh3 33k

0

Entering edit mode

bowtie works too

ADD REPLY • link 13.2 years ago by Aaron Statham ★ 1.1k

0

Entering edit mode

Just to clarify, these mappers will successfully map the reads, but in the overlapping part of the reads, do they get the coverage right for subsequent SNP calling? Meaning, the overlapping reads represent a single molecule, and should represent a single read at that position. If you map a single pair of overlapping reads, does it give you a coverage of two at the overlap location, and one at the non-overlapping portion of the reads?

ADD REPLY • link 10.5 years ago by evanmelstad • 0

0

Entering edit mode

No, but that's not really the aligner's job. The subsequent analysis tools for finding SNPs and such would have to explicitly consider the overlap. Many do not.

ADD REPLY • link 10.5 years ago by Chris Miller 22k

score 0 · Answer 2 · 2013-11-11

0

Entering edit mode

10.5 years ago

Adrian Pelin ★ 2.6k

I believe a completely different approach is necessary here.

It is not which mapper can map overlapping reads, but the fact that you have to merge your reads together before doing any mapping, see comment #2 and #3 to the first answer.

EDIT: Tools are FLASH and SeqPrep, I use the latter but there are others as well.

EDIT2: wow this thread had been dead for 3 years, nice bump:)

ADD COMMENT • link 10.5 years ago by Adrian Pelin ★ 2.6k

0

Entering edit mode

SeqPrep is nice, but it does not get around the problem. Your reads could overlap by only one nucleotide, in which case you cannot merge them, but your mapper still has to deal with the fact that they are overlapping.

ADD REPLY • link 10.5 years ago by Ryan Thompson ★ 3.6k