Question: Different results for differentially expressed genes in RNA-seq analysis with cuffdiff and R::genefilter
gravatar for strangephone1988
2.1 years ago by
strangephone19880 wrote:

Hi everyone,

I have met a problem with RNA-seq analysis.

My data: 4 experimental conditions with each of 3 biological replicates, total 12 samples.

When used cuffdiff, I got a p-value and q-value for every gene (the algorithm seems to be a beta-negative binomial distribution).

And then, I used R packages genefilter::rowFtest to calculate the welch t-test p-value of every gene based on their FPKM values.

In the end, every gene had two different p-values from cuffdiff and R::genefilter.

The numbers of differentially expressed genes from cuffdiff and genefilter are quite different.

My question is that:

  1. Which one could I believe?

  2. Is the welch t-test fit for calculation of RNA-seq differentially expressed genes?

  3. When I deal with FPKM values from cufflinks for other analysis, just like for R:package::genefilter, should I perform a normalization for the matrix of the FPKM values?

Thank you very much

My Best.

Junfeng Shi

genefilter rna-seq cuffdiff • 1.0k views
ADD COMMENTlink modified 2.1 years ago by Amitm2.0k • written 2.1 years ago by strangephone19880
gravatar for Amitm
2.1 years ago by
Amitm2.0k wrote:

Hi, Are you sure you are working with FPKM values? If yes, then such data is not amenable to t-test (or similars), because FPKM data can't be approximated by Normal distribution (which is required for variants of t-test).

And thats the reason, cuffdiff used beta-negative binomial distribution. Count data or FPKM data are variously approximated by Poisson distrib. or the negative binomial.

If you have replicates and differential expression at gene level is your motive, I would suggest get Count data from your RNA-seq BAM files and use many well established packages like DESeq, EdgeR, limma (Voom method) etc. (all in BioConductor repo.)

There is no particular advantage in choosing FPKM values, if only gene level diff-exp. is desired. On the other hand, if you want transcript-level quantification, then tools like Cuffflinks -> Cuffdiff, or StringTie -> Ballgown , can give you differential expression using FPKM values.

So, please check methods that are appropriate for the data type you have.

ADD COMMENTlink written 2.1 years ago by Amitm2.0k

Thank you very much. I have learned a lot from your answer. I need to re-consider my methods currently used in my analysis. Thanks a lot.

ADD REPLYlink written 2.1 years ago by strangephone19880
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1980 users visited in the last hour