In my poly A capture RNA sequencing fastq output, I noticed that about 20% of the reads contain poly A in the middle (or even close to the front). I would like to understand more on this, because with DNA fragments mostly >300bps and read length 100 bps, we were not expecting to see poly A show up that frequently in reads. I would appreciate your thinking.
To my understanding, even it is a poly A capture sequencing, during the library preparation, the mRNA tail fragments (with polyA) are the only fragments will be selected. In this selection, adapter contains ~15bps poly T can bind anywhere on the polyA tail, which can be >200bps long. My theory is that, if it binds towards the 3' end of polyA, then that literally allows major part of the polyA tail gets amplified in PCR, and can potentially pass size selection, but the coding region in this case can be short at 5'.
Say the Ts-Adapter start binding at the highlighted A, then the last 2-3 A will be gone after first round of pcr, but the rest of As will remain till sequencing.
Any advise is appreciated. Thank you.