Question: Is it possible to analyze together rna seq outputs from cufflinks and rsem?
gravatar for SandraGarcia
12 months ago by
Barcelona, Spain
SandraGarcia0 wrote:

Hello everyone! I want to analyze together the gene expression profiles from two different datasets and compare a number of genes between the two. One of them is in rsem v2 output format, and the other in fpkm (cufflink output). I have triend calculating the fpkm from rsem raw counts, but the distributions do not correlate at all so I am not sure if it would be possible to compare the gene expression among them as the data do not seem to be in the same scale.

Do you have any idea if it is possible to make these two data type comparable?

Thank you in advance,


rsem rna-seq cufflinks fpkm • 449 views
ADD COMMENTlink modified 12 months ago by Kevin Blighe30k • written 12 months ago by SandraGarcia0
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe30k
Republic of Ireland
Kevin Blighe30k wrote:

You're implying that you want to merge the datasets together? I think that the best approach would be to convert both datasets independently to the Z-scale and to then see how they line up in a histogram (and also do your correlation analyses). Another option could be to use one dataset as the training dataset, and the other as the validation dataset.

If you go down the merging route, then you will always be criticised by reviewers when trying to publish.

Hope this helps!


ADD COMMENTlink written 12 months ago by Kevin Blighe30k

Thank you Kevin, I will go for it. Yes I want to merge the two datasets. We are also thinking of asking for the permissions to have the bam files of the second dataset so I will be able to re-analyze them with the same software.

ADD REPLYlink written 12 months ago by SandraGarcia0


Well, the BAM files would help, but they would still be produced very differently, i.e., a BAM produced by Bowtie/TopHat is very different from a BAM produced by some other aligner. You will just have to be very methodical when merging these datasets. I mentioned Z-scores because I had success with that in the past, but the differences between RSEM and FPKM are even great than the differences that I encountered.

ADD REPLYlink written 12 months ago by Kevin Blighe30k

An update (12th August 2018):

You should abandon RPKM / FPKM normalisation. They are not ideal where cross-sample differential expression analysis is your aim; indeed, they render samples incomparable via differential expression analysis: Please read this: A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis

In their key points:

The Total Count and RPKM normalization methods, both of which are still widely in use, are ineffective and should be definitively abandoned in the context of differential analysis.

Note - FPKM is essentially the same as RPKM

ADD REPLYlink written 9 weeks ago by Kevin Blighe30k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1600 users visited in the last hour