This may seem like a weird question, but we need to filter our RNAseq data for reads that contain polyA. The data is stranded RNA-seq, 50 bp reads. Would it be easier to find these reads before or after alignment? To be clear, the 50 bp read needs to contain a stretch of polyA, not just come from a transcript containing polyA. Has anyone done this type of analysis?
If data is stranded, polyA tails will be always at the end of sequences, I have read a couple of papers where this information is important (to define UTRs mainly, if I remember the reference I will post it). They usually consider a 8-10 nucleotides as the minimum length.