Question

PolyA reads in RNA-seq and how do aligners handle them?

0

Entering edit mode

7.8 years ago

Ruolin Liu • 0

I've seen many polyA reads related question being asked in this forum many times but I still couldn't form a concrete answer for this. It seems to me that it is true that polyA reads do exist but some people say there are in low abundance. According to my simulation (I use FluxSimulator), there are ~3.3% polyA which is not a negligible number. (1) My first question is what is the number in the real cases?

I assume the polyA reads can help with alignment in the paired-end case because aligner would know the other mate (not a polyA mate) is somehow close to 3'end of a certain transcript. And I did use my simulation to prove my point. It seems like to be the contrary. Below are Tophat2 align_summary.txt.

Before removing polyA reads.

Left reads: Input : 1273922 Mapped : 970916 (76.2% of input) of these: 23097 ( 2.4%) have multiple alignments (0 have >20)

Right reads: Input : 1273922 Mapped : 970528 (76.2% of input) of these: 23191 ( 2.4%) have multiple alignments (0 have >20)

76.2% overall read mapping rate.

Aligned pairs: 745148 of these: 20176 ( 2.7%) have multiple alignments 740 ( 0.1%) are discordant alignments

58.4% concordant pair alignment rate.

After removing polyA reads.

Left reads: Input : 1232114 Mapped : 970894 (78.8% of input) of these: 24835 ( 2.6%) have multiple alignments (2 have >20)

Right reads: Input : 1232114 Mapped : 970248 (78.7% of input) of these: 24866 ( 2.6%) have multiple alignments (2 have >20)

Unpaired reads: Input : 343 Mapped : 274 (79.9% of input) of these: 1 ( 0.4%) have multiple alignments (0 have >20)

78.8% overall read mapping rate.

Aligned pairs: 764966 of these: 21791 ( 2.8%) have multiple alignments 386715 (50.6%) are discordant alignments

30.7% concordant pair alignment rate.

RNA-Seq alignment sequencing • 2.2k views

ADD COMMENT • link updated 7.8 years ago by Brian Bushnell 20k • written 7.8 years ago by Ruolin Liu • 0

score 1 · Answer 1 · 2016-06-30

There are many RNA-seq protocols, with various artifacts. Aligners do not try to model these artifacts (at least, none that I am aware of); rather, the better the read matches the reference, the better it will align. As such, poly-A will reduce your alignment rate because it is a non-genomic artifact.

Also, it looks like the reads were trimmed in such a way that the pairing order was broken, which is why the discordant pairing rate went from 0.1% to 50%.

Sorry, this just a partial answer... I'm not sure what real-life poly-A rates are.