Hello Everyone, I am desperately seeking the code for removing 3p-seq reads that were mapped through internal poly(A) priming. 3pseq is reading from the poly(A) junction region. therefore the 5' end of the 3p-seq reads tend to be mapped to the cleavage and polyadenylation sites. However, in some cases, 3p-seq RT primer is annealed to the internal region of mRNA that have consecutive A sequences, which should be removed from our final bed file.
What I currently have is after removing UMI and 8 consequtive Ts from the 5 end of my reads, and map the trimmed reads to the human genome. 5' end of the reads will represent the 3' terminal of mRNA (or cleavage and polyadenylation site).
What I want to do is Check if the 3p-seq reads have 6 consecutive As in the downstream 10 nucleotide region. If it does, those reads are thought to be derived from the interal priming, and will be removed from my bed files.
Do you know the codes to execute this process? Best,