Question: PolyA reads in RNA-seq and how do aligners handle them?
0
gravatar for Ruolin Liu
2.7 years ago by
Ruolin Liu0
Ruolin Liu0 wrote:

I've seen many polyA reads related question being asked in this forum many times but I still couldn't form a concrete answer for this. It seems to me that it is true that polyA reads do exist but some people say there are in low abundance. According to my simulation (I use FluxSimulator), there are ~3.3% polyA which is not a negligible number. (1) My first question is what is the number in the real cases?

I assume the polyA reads can help with alignment in the paired-end case because aligner would know the other mate (not a polyA mate) is somehow close to 3'end of a certain transcript. And I did use my simulation to prove my point. It seems like to be the contrary. Below are Tophat2 align_summary.txt.

Before removing polyA reads.

Left reads: Input : 1273922 Mapped : 970916 (76.2% of input) of these: 23097 ( 2.4%) have multiple alignments (0 have >20)

Right reads: Input : 1273922 Mapped : 970528 (76.2% of input) of these: 23191 ( 2.4%) have multiple alignments (0 have >20)

76.2% overall read mapping rate.

Aligned pairs: 745148 of these: 20176 ( 2.7%) have multiple alignments 740 ( 0.1%) are discordant alignments

58.4% concordant pair alignment rate.

After removing polyA reads.

Left reads: Input : 1232114 Mapped : 970894 (78.8% of input) of these: 24835 ( 2.6%) have multiple alignments (2 have >20)

Right reads: Input : 1232114 Mapped : 970248 (78.7% of input) of these: 24866 ( 2.6%) have multiple alignments (2 have >20)

Unpaired reads: Input : 343 Mapped : 274 (79.9% of input) of these: 1 ( 0.4%) have multiple alignments (0 have >20)

78.8% overall read mapping rate.

Aligned pairs: 764966 of these: 21791 ( 2.8%) have multiple alignments 386715 (50.6%) are discordant alignments

30.7% concordant pair alignment rate.

sequencing rna-seq alignment • 1.1k views
ADD COMMENTlink modified 2.7 years ago by Brian Bushnell16k • written 2.7 years ago by Ruolin Liu0
1
gravatar for Brian Bushnell
2.7 years ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

There are many RNA-seq protocols, with various artifacts. Aligners do not try to model these artifacts (at least, none that I am aware of); rather, the better the read matches the reference, the better it will align. As such, poly-A will reduce your alignment rate because it is a non-genomic artifact.

Also, it looks like the reads were trimmed in such a way that the pairing order was broken, which is why the discordant pairing rate went from 0.1% to 50%.

Sorry, this just a partial answer... I'm not sure what real-life poly-A rates are.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 885 users visited in the last hour