Question: Comparing stranded reads to unstranded reads
gravatar for piyushjo
5 weeks ago by
piyushjo110 wrote:


I have two datasets from different sources. Unfortunately one group have done unstranded RNA-seq while the second one has done stranded. When I do the PCA analysis of normalized reads using DESeq2, I see them clustering far from each other. Now I am doubtful if there is an artefact coming from the unstranded reads of the first group or is the difference real. Could anyone enlightment me if it would be appropriate to use these two datasets for comparisons for differential gene expression or will get wrong information for transcripts on the reverse strand?


ADD COMMENTlink modified 5 weeks ago by johnsonnathant80 • written 5 weeks ago by piyushjo110
gravatar for swbarnes2
5 weeks ago by
United States
swbarnes24.8k wrote:

If the two sets of samples were prepped at different places at different times, strandedness is likely just a part of the larger batch effect.

ADD COMMENTlink written 5 weeks ago by swbarnes24.8k

But doesn't DESeq2 takes into account the difference in library depth? What else could be contributing to variation?

ADD REPLYlink written 5 weeks ago by piyushjo110

Batch effect is far more than library depth. The same samples prepped in different hands will have slightly different gene expression values. That's just life in experimental science.

ADD REPLYlink written 5 weeks ago by swbarnes24.8k

But main question is just because one library is stranded and the other is unstranded, would that make them incomparable? I understand differences from human and machines are also involved.

ADD REPLYlink written 5 weeks ago by piyushjo110

It depends on how you want to do the analysis. If you are looking for DE genes between both datasets, then it will be difficult to distinguish between genes that are different due to the library prep protocol or the biology of those datasets. If it is possible to mix the two data sets then do the analysis then it is more likely to come up with a decent DE gene list. This scenario would be possible if the biological question being asked is the same ie, both datasets sequenced lung cancer and normal lung. So mixing the samples would reduce the noise from the sample prep. Hope that helps.

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by johnsonnathant80

Ok. Thanks. I am comparing cerebellum to medulloblastoma (cancer of cerebellum). The only thing I think bothers me is if the anti-sense transcripts for an overlapping mRNA would be improperly quantified.

ADD REPLYlink written 5 weeks ago by piyushjo110

Ideally, wouldn't mix the datasets, but everything would be done exactly the same. However, there is also the potential for insight if the analysis is done right as it could help highlight whether there is important information gathered from anti-sense transcripts.

ADD REPLYlink written 5 weeks ago by johnsonnathant80
gravatar for johnsonnathant
5 weeks ago by
United States
johnsonnathant80 wrote:

It is common for library prep to be a confounding 'batch' effect factor during RNA-Seq analysis since its different therefore the selection of RNA will be different. Here is a good article ( that will highlight some of the preparation differences. It is not surprising to me that would show up in the expression data.

ADD COMMENTlink written 5 weeks ago by johnsonnathant80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 752 users visited in the last hour