Question

Total RNA-Seq library prep - what do you do with the 'true' intronic reads? Would you include in a differential gene expression analysis?

0

Entering edit mode

7.8 years ago

achamess ▴ 90

Hi all,

So I recently performed total RNA-seq using the Nugen SoLo library prep kit. My sample was a pool of sorted neuronal nuclei from mouse. Our informatics core has aligned the sequencing data using STAR and found that about 20% is exonic, and 60-70% is intronic. Now, that is to be expected, I suppose, since this is a total RNA-seq prep and I'm using nuclear RNA as input. But now my question is, how do you handle the intronic data?

One of my goals is differential gene expression. Would you combine the intronic reads with the exonic reads, collapsing to the gene level, or would you analyze both separately, or would you disregard the introns (which would be kind of a waste). What do you think? And do you have some published analysis methods that deal with this kind of situation? I see lots of total RNA-seq prep kits now on the market, so I can't be the first to encounter this issue.

Thanks!

RNA-Seq alignment • 3.6k views

ADD COMMENT • link 7.8 years ago by achamess ▴ 90

0

Entering edit mode

What sort of relative abundances are you seeing between introns and their flanking exons?

ADD REPLY • link 7.8 years ago by spvensko ▴ 240

0

Entering edit mode

Not sure, but I will check. Good question.

ADD REPLY • link 7.8 years ago by achamess ▴ 90

score 2 · Answer 1 · 2016-07-08

2

Entering edit mode

7.8 years ago

Martombo ★ 3.1k

Usually in a standard RNA-seq analysis, you only compare the mRNA levels, so the exonic reads. Despite this, intronic reads can tell you about the abundance of pre-mRNAs. The ratio between exonic and intronic can then be used to estimate post-transcriptional effects, see here. Since you're working on nucleic extracts, I guess that post-transcriptional effects would be rather limited. I guess you could still use the intronic reads to exonic reads ratio as a measure of splicing efficiency, if you're interested...

ADD COMMENT • link 7.8 years ago by Martombo ★ 3.1k

0

Entering edit mode

Good points. What would the downside be to summarizing my reads at the gene level (exon + intron)? I realize maybe there might be some bias because some intronic sequences may be degraded differentially, but the fact that those introns are there means that active transcription of those genes is occurring, no?

Have you seen anyone summarize by combining exon + intron from total-RNA-seq?

ADD REPLY • link 7.8 years ago by achamess ▴ 90

0

Entering edit mode

I've never seen exonic and intronic reads combined. Most of the times, RNA-seq is used to estimate a functional level for a gene. For a protein coding gene, this is only given by the expression of its mRNA, which will produce a protein. If you have really low counts for the exons you might try to sum intronic and exonic reads, under the assumptions that they are closely correlated. But I think that the bias you introduce will be greater than the reduction in the counts noise. Nevertheless, you can perform both analyses and compare them, so as to get more insights in your data.

ADD REPLY • link 7.8 years ago by Martombo ★ 3.1k

score 0 · Answer 2 · 2016-07-14

So I'm thinking I will go ahead and sum the intronic and exonic reads and I'll compare to the exons alone.

I was looking at ways of doing this, and featureCounts looks like an appropriate tool. The feature I'll be counting would be the gene body, yes, since this is anything from the TSS to the end of the transcript, which will include introns.

I'm really new to RNA-Seq analysis. What kind of annotation file should I use to compare my reads in featureCounts?

And then, the output from featureCounts could be used in something like edgeR or DESeq2, yes?

Sorry for these really basic questions. In the future, I might just make my life easier and do a polyA enrichment and get around the issue of non-exonic reads. I thought the extra information from total RNA-seq would be an advantage, but I'm seeing that the analytic methods for total vs. mRNA are less mature.