Strand information from reverse stranded RNA-seq
1
0
Entering edit mode
4 months ago

Hello,

I have some RNA-seq data that was prepped using a reverse-stranded kit. It is not poly-A selected, as we are in part interested in transcription of repetitive regions, including endogenous retroviruses.

Despite knowing that the library is reverse stranded, I decided to run a tool (TEtranscripts to be specific) with reverse, forward, and unstranded settings to compare the numbers. I found that I consistently get much higher counts in HERV genes when I specify it to be unstranded, and I quickly reasoned out that this is due to potential bidirectional transcription of many of these.

In the GTF file, each HERV is annotated as either + or -, but in reality, antisense transcripts may be common for some elements. So, when I specify reverse and I use that GTF, I believe I get a count of all the transcripts from the "sense" direction. When I specify unstranded, I get all the transcripts going in both directions combined. If this is false, please correct me!

My question is, is specifying forward an adequate way to get a count for antisense transcripts? Or would the better thing to do be to essentially duplicate the GTF, switch the signs, and run it as reverse again?

In addition, when I compare running it forward and reverse to unstranded, the "forward" measure and the "reverse" measure for a given gene do not quite add up "unstranded" measure. Why would this be?

For example, HERV-A will have 4000 reads mapped to it using forward, 2500 mapped using reverse, and 5100 mapped when unstranded. This makes me think my understanding it wrong in some capacity!

I know that running it as reverse stranded (as is intended) will give me trustworthy, accurate information, but I'm simply wondering what the best way to squeeze more information out is. If we performed a stranded RNA-seq, I'd love to get the stranded information out of it, and I'm convinced it's possible to do in an accurate way!

Thanks to all who may be able to help!

Mapping RNA-seq HERVs • 357 views
ADD COMMENT
0
Entering edit mode
4 months ago

If two features overlap in opposite orientations, knowing that the prep is stranded makes it clear which gene it belongs to. When the data is unstranded, the software can't assign all the reads to a gene.

ADD COMMENT

Login before adding your answer.

Traffic: 2430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6