Identify RNA-seq reads containing polyA sequence
2
0
Entering edit mode
2.7 years ago
goodez ▴ 510

This may seem like a weird question, but we need to filter our RNAseq data for reads that contain polyA. The data is stranded RNA-seq, 50 bp reads. Would it be easier to find these reads before or after alignment? To be clear, the 50 bp read needs to contain a stretch of polyA, not just come from a transcript containing polyA. Has anyone done this type of analysis?

RNA-Seq • 1.3k views
ADD COMMENT
1
Entering edit mode
2.7 years ago
GenoMax 111k

Use bbduk.sh from BBMap suite in filter mode (don't specify ktrim= or qtrim= options) with literal=AAAAA to filter the reads out (adjust length of A's as needed). Use with original data.

ADD COMMENT
0
Entering edit mode

Thanks! I have some additional questions then. Since it is stranded RNA-seq, the polyA will actually be stretches of TTTTTT right?

Also I used grep to look for reads containing this, and many of the TTTTTT stretches are in the middle of a read. It doesn't seem possible that the polyA could be surrounded by other sequence on both ends.

ADD REPLY
0
Entering edit mode

If you are capturing second strand then yes. Past the TTTTT the sequence may be going into adapters. You can easily check that by trimming reads you filter and select.

ADD REPLY
1
Entering edit mode
2.7 years ago
Buffo ★ 1.9k

If data is stranded, polyA tails will be always at the end of sequences, I have read a couple of papers where this information is important (to define UTRs mainly, if I remember the reference I will post it). They usually consider a 8-10 nucleotides as the minimum length.

ADD COMMENT
0
Entering edit mode

Thanks, that is good to know. Please do share the reference if you find it again!

ADD REPLY

Login before adding your answer.

Traffic: 1946 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6