Question: Can We Compare Two Different Rna-Seq Experiments?
gravatar for k.nirmalraman
6.9 years ago by
k.nirmalraman1.0k wrote:


I am trying to analyze two different RNA Seq experimental data ( Two different experiments, (different runs), same platform). I would like to normalize the data from both the experiments together, to gain some insights on cell type specific expression profile (for a preliminary evaluation).

In such a case, can some one tell me how can I do the normalization (any established methods?). Any directions on what are the possible challenges and any directions towards this approach would be of great help.

Thanks in advance!

normalization rna-seq • 10k views
ADD COMMENTlink modified 6.4 years ago by Mikael Huss4.7k • written 6.9 years ago by k.nirmalraman1.0k

I am planning for something similar in my work. Would be helpful if you share your experiences.

ADD REPLYlink written 10 months ago by Arindam Ghosh200
gravatar for Damian Kao
6.9 years ago by
Damian Kao15k
Damian Kao15k wrote:

Here is a good paper that compares several different normalization methods:

ADD COMMENTlink modified 12 months ago by RamRS25k • written 6.9 years ago by Damian Kao15k

Hi Damian,

Thanks for the link.. It is a very informative paper... Nevertheless, I was wondering would it be possible to normalize two different RNA-Seq experiments, so one can perform DE kind of analysis.

I understand this will lead to all possible limitations of a poor experiment design :( But this is only to arrive at some kind of candidate genes that can be validated...

ADD REPLYlink modified 12 months ago by RamRS25k • written 6.4 years ago by k.nirmalraman1.0k
gravatar for Mikael Huss
6.4 years ago by
Mikael Huss4.7k
Mikael Huss4.7k wrote:

Not sure I understand your question properly, but in case I do ...

  • Download FASTQ files for the two experiments
  • Map them in the same way (e g STAR)
  • Quantify in the same way (e g HTSeq)
  • Merge all the counts into a single table
  • Use some scaling normalization method (e g TMM) on everything
  • Use some DE package (e g limma) to call differentially expressed genes

Does that help ..?

ADD COMMENTlink modified 12 months ago by RamRS25k • written 6.4 years ago by Mikael Huss4.7k

The recent update that the experiments were done in different runs throws a bit of a kink in that. It's often the case that different library prep or RNA extraction dates produce a batch effect. Since the cell types (or some other factor) are presumably partitioned by this batch, any DE calls will be confounded by this. I don't know of any great way to get around that sort of things without having at least one other batch of one of the cell types (or whatever) so that the batch effect might at least be estimated.

ADD REPLYlink modified 12 months ago by RamRS25k • written 6.4 years ago by Devon Ryan94k

In that case (that the cell types [etc] are partitioned by batch) it's hard, yes. I didn't get the impression that that was necessarily the case, but if it is, then it is of course hard to get around the confounding. Apart from that, this recent paper found that reproducibility was good between labs and runs if you stick to the exact same library prep protocol. Even though there is some bias from the RNA extraction, it seems manageable. If the different labs had all done their own library preps, I think the results would have looked a lot different.

ADD REPLYlink modified 12 months ago by RamRS25k • written 6.4 years ago by Mikael Huss4.7k

I agree but will add that as I understand it they only looked at the sequencing steps in that comparison. The cultures were grown up at the same lab and then a frozen pellet was shipped. There is also variability in the growth that causes batch effects so you would expect that they would have gotten poorer results if, e.g. the cell lines were grown at each site for a while.

I would be super cautious about designing an study where one experiment is the test and the other is the control. You don't know how much slop you have. That said, I would probably do it but only as a pilot or to support another better designed finding.

If you have the same library prep. The GC and length biases that you would get from different protocols would be super hard to sort out from biology. It is possible that it would take so long to do it properly that it would cost more than redoing the experiment with the data you really want.

Doing a badly designed experiment is never cheaper, though the up front costs make it appear so.

ADD REPLYlink modified 12 months ago by RamRS25k • written 6.4 years ago by Michele Busby2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 857 users visited in the last hour