Question

Why Cuffdiff Output Not Significant?

0

Entering edit mode

8.8 years ago

rodrigo.nicodemos82 ▴ 10

Hello,

I am performing RNAseq analysis using TopHat and Cufflinks programs. I have a question for why a gene I am interested is being called Not Significant by Cuffdiff.

Here the output for my gene of interest Rh4:

gene    sample_1    sample_2    status    value_1    value_2    log2(fold_change)    test_stat    p_value    q_value     signific
Rh4     WT          MUT         OK        1158.87    0.856892    -10.4013            -4.12026     0.1178     0.404333    no

I am not familiar with the statistics Cuffdiff uses to call my gene of interest Not Significant. Can someone explain it to me?

Thank you very much.

Rodrigo

RNA-Seq • 3.6k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.8 years ago by rodrigo.nicodemos82 ▴ 10

0

Entering edit mode

what's the variance, do you have any replicates?

ADD REPLY • link 8.8 years ago by Michael 54k

0

Entering edit mode

As Michael says, Variance or replicates will most likely be your issue. Have you thought about plotting the FPKM values of each replicate to see how variable it is? - You can use cummeRbund to read in a cuffdiff output and extract the FPKM values.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

The FPKM values for the replicates were: WT_1: 1245.24, WT_2: 874.11 and MUT_1: 0.897, MUT_2: 0.764, MUT_3: 0.823. The third replicate for the WT condition gave very few reads, which is why I didn't include it. I only used two replicates from the MUT condition to compare to the two replicates from the WT condition. I am not convinced it is a replicate issue or a variance issue, since the trend it pretty obvious to me, but Cuffdiff it not picking it up statistically.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by rodrigo.nicodemos82 ▴ 10

0

Entering edit mode

Have you ran your results through DESeq2? - DESeq2 has a function (I think it's called a Cook's distance), to figure out if samples have too high a variance for a given condition, thus it excludes them. The other thing is to pull out a few of the "Top Hits" from your analysis and see what their FPKM's are like, see if the variance of your wild type fits with the difference your seeing in the case above.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.8 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

This is actually an interesting situation - I agree with the OP that the gene should come up as differentially expressed - but the way this and probably all other tools work is a bit mysterious and frankly occasionally plain wrong.

Even before multiple comparison testing the p value is already huge 0.1

I have a hard time believing that getting a 1245 and 874 versus 0.9 and 0.7 could possibly be that likely to obtain.

Caveat emptor before buying into any tool.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Istvan Albert 100k

0

Entering edit mode

That seems pretty wacky, but we're only seeing 1 gene. In general the p-values you see will depend also on the rest of the population (though that gene *is* very extreme). The numbers above suggest that you have highly scattered data - that your wt and mutants are *very* different from each other overall (is this true?). The second thing is that Cuffdiff works by summing the values of all transcripts from the locus (which makes for odd numbers sometime). Is this a complicated locus? Do you have the individual transcript values? Have you looked at the read mappings in a genome browser? Any artifacts going on?

ADD REPLY • link 8.6 years ago by seidel 11k