There is a file called align_summary.txt in the tophat folder (generated by running tophat) which says:
Mapped: 98314933 (76.2% of input)
of these: 11898655 (12.1%) have multiple alignments (9004 have >20)
Mapped: 95536410 (74.1% of input)
of these: 10769172 (11.3%) have multiple alignments (2289 have >20)
75.1% overall read alignment rate.
Aligned pairs: 92923521
of these: 8913959 ( 9.6%) have multiple alignments
and: 1899417 ( 2.0%) are discordant alignments
70.6% concordant pair alignment rate.
Does what it says at the end “70.6% concordant pair alignment rate” mean that 70.6% of pair-end reads mapped uniquely (single match) as a pair? And are these 70.6% of paired reads is what included in the accepted_hits.bam?
What about splice junctions which mapped uniquely to transcriptome (rather than genome), are they included in this 70.6%? In either case, does junctions.bed file contains splice junctions which mapped uniquely to transcriptome? Does 70.6% refer to both, reads uniquely mapped to genome and splice junctions uniquely mapped to transcriptome?
Would appreciate a clarification.