Question

DGE analysis using stranded and unstranded RNA-seq libraries.

1

Entering edit mode

8.1 years ago

Sentinel156 ▴ 190

Hi all,

I am working with Illumina HiSeq 2000 100bp single end RNA-seq data. Some of my samples originate from unstranded libraries and some from stranded libraries. I'm trying to understand the best way to do read summarisation for these libraries using featurecounts for eventual DGE analysis. To date I have treated all datasets as unstranded for mapping (tophat) and counting (featurecounts).

However I am fearful that read counts for my unstranded libraries will be biased for genes which have antisense transcripts (since reads originating from the antisense transcript will be fused into the counts for the gene on the sense strand in positions that the two features overlap). So what is the recommended course of action here? I'm not interested in antisense transcripts so should i continue to treat everything as unstranded for the featurecounts run? I have seen some other threads here that suggest incorporating strandedness into the DGE calculation as a multi-factorial design but was hoping for a more thorough explanation of how this is the better workaround for this problem.

Thank you in advance.

RNA-Seq • 3.9k views

ADD COMMENT • link updated 8.1 years ago by Devon Ryan 104k • written 8.1 years ago by Sentinel156 ▴ 190

score 9 · Answer 1 · 2016-03-22

9

Entering edit mode

8.1 years ago

Devon Ryan 104k

Do you really think mapping stranded libraries as if they're unstranded and then doing the counted in an unstranded fashion gets rid of all possible bias? I expect not. That's why you'll see everyone suggesting to align each sample as appropriate (stranded or not, depending on the sample), doing the counting as appropriate (stranded or not, depending on the sample), and then adding a batch effect into the model (with an interaction term if you're really concerned, have a look at a PCA plot).

ADD COMMENT • link 8.1 years ago by Devon Ryan 104k

4

Entering edit mode

+1 for your answer. By the way, I think the interaction term [batch:condition] is really needed here since antisense transcripts usually have opposite expression dynamics than their sense counterparts. Meaning that, in a condition, if a gene is overexpressed, there is a good chance that its antisense will be underexpressed. So the batch effect is expected to vary accross conditions, especially for the genes you are interested in, i.e, those who are differentially expressed accross conditions.

ADD REPLY • link 7.6 years ago by Carlo Yague 8.7k

0

Entering edit mode

Hi, I do not think these statements are true "since antisense transcripts usually have opposite expression dynamics than their sense counterparts" and "in a condition, if a gene is overexpressed, there is a good chance that its antisense will be underexpressed". If you are talking about natural antisense transcripts (NATs) or non-coding antisense, it is not a general phenomenon where you always find anti-correlative expression. Because these expression concordance between sense and antisense is context dependent (tissue or cell type etc.,).

Examples:

The landscape of antisense gene expression in human cancers

A cautionary tale of sense-antisense gene pairs: independent regulation despite inverse correlation of expression

Genome-wide Identification and Characterization of Natural Antisense Transcripts

Genome-wide analysis of expression modes and DNA methylation status at sense–antisense transcript loci in mouse

Sense-Antisense lncRNA Pair Encoded by Locus 6p22.3 Determines Neuroblastoma Susceptibility

Conserved expression of natural antisense transcripts in mammals.

This is not a answer to the main question rather it is reply for the statement made in this post.

ADD REPLY • link 5.9 years ago by EagleEye 7.5k

0

Entering edit mode

Well the situation is perhaps more complex in higher eukaryotes, but I think that in simpler systems, the anti-correlation between sense and anti-sense transcription is rather well established. There is for instance this recent paper:

Native elongating transcript sequencing reveals global anti-correlation between sense and antisense nascent transcription in fission yeast.

ADD REPLY • link 5.9 years ago by Carlo Yague 8.7k

0

Entering edit mode

Hi,

My point was, there are evidence for both positive and negative correlation with good publications. So there is no general rule that sense and antisense are globally anti-correlated or positively correlated. There are many factors contributing to that (some times it is species dependent too).

Sorry one more reference, Antisense Transcription in the Mammalian Transcriptome

ADD REPLY • link 5.9 years ago by EagleEye 7.5k