I've seen many polyA reads related question being asked in this forum many times but I still couldn't form a concrete answer for this. It seems to me that it is true that polyA reads do exist but some people say there are in low abundance. According to my simulation (I use FluxSimulator), there are ~3.3% polyA which is not a negligible number. (1) My first question is what is the number in the real cases?
I assume the polyA reads can help with alignment in the paired-end case because aligner would know the other mate (not a polyA mate) is somehow close to 3'end of a certain transcript. And I did use my simulation to prove my point. It seems like to be the contrary. Below are Tophat2 align_summary.txt.
Before removing polyA reads.
Left reads: Input : 1273922 Mapped : 970916 (76.2% of input) of these: 23097 ( 2.4%) have multiple alignments (0 have >20)
Right reads: Input : 1273922 Mapped : 970528 (76.2% of input) of these: 23191 ( 2.4%) have multiple alignments (0 have >20)
76.2% overall read mapping rate.
Aligned pairs: 745148 of these: 20176 ( 2.7%) have multiple alignments 740 ( 0.1%) are discordant alignments
58.4% concordant pair alignment rate.
After removing polyA reads.
Left reads: Input : 1232114 Mapped : 970894 (78.8% of input) of these: 24835 ( 2.6%) have multiple alignments (2 have >20)
Right reads: Input : 1232114 Mapped : 970248 (78.7% of input) of these: 24866 ( 2.6%) have multiple alignments (2 have >20)
Unpaired reads: Input : 343 Mapped : 274 (79.9% of input) of these: 1 ( 0.4%) have multiple alignments (0 have >20)
78.8% overall read mapping rate.
Aligned pairs: 764966 of these: 21791 ( 2.8%) have multiple alignments 386715 (50.6%) are discordant alignments
30.7% concordant pair alignment rate.