Question: Questionable Cuffdiff Test Statistic/P-Value Output
gravatar for Jason
6.2 years ago by
United States
Jason60 wrote:

I'm analyzing Illumina RNA-Seq data from three FADS1 knockout mice and three wild types. After running cuffdiff on the six samples, I see in genes.read_group_tracking that:

$ egrep "tracking_id|Fads1" genes.read_group_tracking | column -t | less -S
tracking_id  condition  replicate  raw_frags  internal_scaled_frags  external_scaled_frags  FPKM       effective_length  status
Fads1        null       0          1          0.880439               0.880439               0.0142817  -                 OK
Fads1        null       1          1          1.09692                1.09692                0.017661   -                 OK
Fads1        null       2          0          0                      0                      0          -                 OK
Fads1        wt         0          347        331.474                331.474                5.38128    -                 OK
Fads1        wt         1          251        246.101                246.101                3.96235    -                 OK
Fads1        wt         2          156        167.516                167.516                2.69709    -                 OK

yet in genes.fpkm_tracking and gene_exp.diff I see:

$ egrep "tracking|Fads1" genes.fpkm_tracking | cut -f 1,10-20 | column -t | less -S
tracking_id  null_FPKM  null_conf_lo  null_conf_hi  null_status  wt_FPKM  wt_conf_lo  wt_conf_hi  wt_status
Fads1        0.0106476  0             0.0303051     OK           4.01357  2.80322     5.21247     OK

$ egrep "q_value|Fads1" gene_exp.diff | cut -f 1,5-15 | column -t | less -S     
test_id  sample_1  sample_2  status  value_1    value_2  log2(fold_change)  test_stat  p_value  q_value   significant
Fads1    null      wt        OK      0.0106476  4.01357  8.55822            4.1466     0.2495   0.999688  no

Where it seems like the confidence intervals are clearly separated, yet the P-value is much higher than my intuition would expect. Any ideas of things to check?

For completeness the cuffdiff command was: cuffdiff -p 12 -o ./analysis/FADS1/cuffdiff -b /mnt/data1/refs/igenomes/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa -L null,wt /mnt/data1/refs/igenomes/Mus_musculus/UCSC/mm10/Annotation/Genes/genes.gtf <bams,from,null> <bams,from,wt>

ADD COMMENTlink modified 6.2 years ago by Charles Warden7.7k • written 6.2 years ago by Jason60

I noticed a similar problem....did you solve this yet? Thanks!

ADD REPLYlink written 6.2 years ago by jie.wu.phd0

We used edgeR and it seemed to detect the gene as DE without any problems. I don't have the p-value off-hand though.

ADD REPLYlink written 6.1 years ago by Jason60
gravatar for Charles Warden
6.2 years ago by
Charles Warden7.7k
Duarte, CA
Charles Warden7.7k wrote:

On one hand, I think cuffdiff can often yield conservative estimates. It also can give weird results, but I tend to expect those more from non-ideal experimental designs (such as groups without replicates). However, I agree that what you are seeing looks like is probably not a reasonable result.

Either way, you can see some benchmarks for different tools here (for example, in the patient comparison, cuffdiff couldn't identify any differentially expressed genes):

I would recommend taking the FPKM values (or raw counts, depending upon the strategy) from cufflinks. You can use sRAP for that analysis, but there are lots of ways to use the cufflinks results without having to use cuffdiff (I prefer DESeq among the popular RNA-Seq tools, but I've heard limma is good as well).

ADD COMMENTlink modified 6.2 years ago • written 6.2 years ago by Charles Warden7.7k

Yes, trying something like DESeq or edgeR is probably next on my todo list. Thanks for the suggestion and links!

ADD REPLYlink written 6.2 years ago by Jason60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 770 users visited in the last hour