Question: Differential Expression Two References One Condition
gravatar for MiguelMorard
5.0 years ago by
MiguelMorard20 wrote:


I have two new Saccharomyces cerevisae genomes assembly and I would like to rerun some RNA-seq data,using these new assemblies. I ran tophat-bowtie-cufflinks and have the FPKM for each gene in the strains. My porblem is that, as I used two different reference genomes, I don't think I can run cuffdiff to analyse differential expression. Googling a bit I found people working with two species, but all have, at least, two conditions for each species... and I only have one.
My question is : can I compare directly the FPKM ? Is there any R package that I could use to manage FPKM directly or any normalization of it (log2 for example...) ?

Thanks in advance for your suggestions.

rna-seq two assemblies R • 1.5k views
ADD COMMENTlink modified 5.0 years ago by Devon Ryan95k • written 5.0 years ago by MiguelMorard20

So, just to clarify, you are trying to do differential expression of two different species?  

ADD REPLYlink written 5.0 years ago by Sean Davis26k

Yes. (I already know which are the orthologs)

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by MiguelMorard20
gravatar for Devon Ryan
5.0 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

tldr: No, you can't directly compare FPKMs.

Trying to do differential expression between species is riddled with difficulties. I would strongly dissuade anyone from attempting such a comparison unless they are very very familiar with analysing RNAseq data and have thought long and hard about all the biases/batch effects that need to be compensated for. As a point of comparison, not doing this is what invalidated some of the mouseEncode paper (specifically, their false claim that samples cluster by species rather than by tissue).

Below is an incomplete list of things that need to be dealt with in such an analysis:

  1. Only orthologs can be compared.
  2. GC differences between orthologs make simple direct comparisons improper.
  3. Differences in transcript/gene/UTR length need to be accounted for.
  4. Are extraction efficiencies and biases the same between the species?

I'm sure I can think of other issues that would need to be dealt with if I thought about this a bit longer. Ensuring that results aren't biases by anything above (or the many things I likely didn't list) is going to be very difficult.

Having written all of that, since you're at least dealing only with yeast you have a good shot at actually compensating for everything properly. Have a look at the sequence similarity and such. I suspect that if the various metrics are really close then you might be OK (though you'd need to demonstrate that in any publication).

ADD COMMENTlink written 5.0 years ago by Devon Ryan95k

Thanks Devon for your answer. Yes, I know that kind of issues could be a problem, but my species are really close, and I will begin working on strains of the same species. So differences of length, GC etc shouldn't be much. I'll have a look on all that though. In the case they seem close enough, what kind of programs/R-Bioconductor would you recommend to use to compare the data ?


ADD REPLYlink written 5.0 years ago by MiguelMorard20

DESeq2, edgeR, or limma/voom. Don't use FPKMs for statistics.

ADD REPLYlink written 5.0 years ago by Devon Ryan95k

So I should use counts ? or z-scores or something like that ?

ADD REPLYlink modified 5.0 years ago • written 5.0 years ago by MiguelMorard20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1101 users visited in the last hour