Question

Identify RNA-seq reads containing polyA sequence

0

Entering edit mode

4.9 years ago

goodez ▴ 640

This may seem like a weird question, but we need to filter our RNAseq data for reads that contain polyA. The data is stranded RNA-seq, 50 bp reads. Would it be easier to find these reads before or after alignment? To be clear, the 50 bp read needs to contain a stretch of polyA, not just come from a transcript containing polyA. Has anyone done this type of analysis?

RNA-Seq • 2.8k views

ADD COMMENT • link updated 4.9 years ago by Buffo ★ 2.4k • written 4.9 years ago by goodez ▴ 640

score 1 · Answer 1 · 2019-05-17

1

Entering edit mode

4.9 years ago

GenoMax 141k

Use bbduk.sh from BBMap suite in filter mode (don't specify ktrim= or qtrim= options) with literal=AAAAA to filter the reads out (adjust length of A's as needed). Use with original data.

ADD COMMENT • link 4.9 years ago by GenoMax 141k

0

Entering edit mode

Thanks! I have some additional questions then. Since it is stranded RNA-seq, the polyA will actually be stretches of TTTTTT right?

Also I used grep to look for reads containing this, and many of the TTTTTT stretches are in the middle of a read. It doesn't seem possible that the polyA could be surrounded by other sequence on both ends.

ADD REPLY • link 4.9 years ago by goodez ▴ 640

0

Entering edit mode

If you are capturing second strand then yes. Past the TTTTT the sequence may be going into adapters. You can easily check that by trimming reads you filter and select.

ADD REPLY • link 4.9 years ago by GenoMax 141k

score 1 · Answer 2 · 2019-05-17

1

Entering edit mode

4.9 years ago

Buffo ★ 2.4k

If data is stranded, polyA tails will be always at the end of sequences, I have read a couple of papers where this information is important (to define UTRs mainly, if I remember the reference I will post it). They usually consider a 8-10 nucleotides as the minimum length.

ADD COMMENT • link 4.9 years ago by Buffo ★ 2.4k

0

Entering edit mode

Thanks, that is good to know. Please do share the reference if you find it again!

ADD REPLY • link 4.9 years ago by goodez ▴ 640