Question: Why Cuffdiff Output Not Significant?
gravatar for rodrigo.nicodemos82
4.7 years ago by
United States
rodrigo.nicodemos8210 wrote:




I am performing RNAseq analysis using TopHat and Cufflinks programs. I have a question for why a gene I am interested is being called Not Significant by Cuffdiff.

Here the output for my gene of interest Rh4:

gene    sample_1    sample_2    status    value_1    value_2    log2(fold_change)    test_stat    p_value    q_value    signific

Rh4     WT              MUT            OK        1158.87    0.856892    -10.4013                -4.12026    0.1178     0.404333    no

I am not familiar with the statistics Cuffdiff uses to call my gene of interest Not Significant. Can someone explain it to me?

Thank you very much.


rna-seq • 2.7k views
ADD COMMENTlink modified 4.6 years ago by Biostar ♦♦ 20 • written 4.7 years ago by rodrigo.nicodemos8210

what's the variance, do you have any replicates?

ADD REPLYlink written 4.7 years ago by Michael Dondrup47k

As Michael says, Variance or replicates will most likely be your issue. Have you thought about plotting the FPKM values of each replicate to see how variable it is? - You can use cummeRbund to read in a cuffdiff output and extract the FPKM values. 

ADD REPLYlink written 4.7 years ago by andrew.j.skelton735.9k

The FPKM values for the replicates were: WT_1: 1245.24, WT_2: 874.11 and MUT_1: 0.897, MUT_2: 0.764, MUT_3: 0.823. The third replicate for the WT condition gave very few reads, which is why I didn't include it. I only used two replicates from the MUT condition to compare to the two replicates from the WT condition. I am not convinced it is a replicate issue or a variance issue, since the trend it pretty obvious to me, but Cuffdiff it not picking it up statistically.

ADD REPLYlink written 4.7 years ago by rodrigo.nicodemos8210

Have you ran your results through DESeq2? - DESeq2 has a function (I think it's called a Cook's distance), to figure out if samples have too high a variance for a given condition, thus it excludes them. The other thing is to pull out a few of the "Top Hits" from your analysis and see what their FPKM's are like, see if the variance of your wild type fits with the difference your seeing in the case above. 

ADD REPLYlink written 4.7 years ago by andrew.j.skelton735.9k

this is actually an interesting situation - I agree with the OP that the gene should come up as differentially expressed - but the way this and probably all other tools work is a bit mysterious and frankly occasionally plain wrong. 

Even before multiple comparison testing the p value is already huge 0.1

I have a hard time believing that getting a 1245 and 874 versus 0.9 and 0.7 could possibly be that likely to obtain.

Caveat emptor before buying into any tool.

ADD REPLYlink written 4.6 years ago by Istvan Albert ♦♦ 83k

That seems pretty wacky, but we're only seeing 1 gene. In general the p-values you see will depend also on the rest of the population (though that gene *is* very extreme). The numbers above suggest that you have highly scattered data - that your wt and mutants are *very* different from each other overall (is this true?). The second thing is that Cuffdiff works by summing the values of all transcripts from the locus (which makes for odd numbers sometime). Is this a complicated locus? Do you have the individual transcript values? Have you looked at the read mappings in a genome browser? Any artifacts going on?

ADD REPLYlink written 4.6 years ago by seidel7.0k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1680 users visited in the last hour