I just discovered that tophat2 sometimes reports the same alignment for the same read multiple times.
I have some paired end RNA-Seq data that I aligned using tophat version 2b using options -I 2500 -i 30 -r 150.
I'm working on building data files for testing and needed some examples of read pairs where either Read1 or Read2 map onto the genome multiple times.
I found some pairs where Read2 had two different alignments.
In each case, the corresponding Read1 was reported twice, in two different lines in the BAM file. But both alignments were in exactly the same location.
This image from IGB shows an example: http://transvar.org/~aloraine/MultiMapperPE.png . In the image, reads from the same pair have the same names and reads are color-coded by strand. Each read is labeled with its name. The selected read (outlined in read) has two identical alignments in the data. This second image shows the Selection Info with the various tags and other attributes of the selected reads: http://transvar.org/~aloraine/MultiMapperPE-SelectionInfo.png
I would have thought that if one member of the pair aligned to just one location, then it's alignment would be reported just once in the BAM file, not twice.
Why is tophat reporting the same alignment for the same read multiple times? Is there an option that will force tophat to not report redundant alignments?