Question: pvalues very low in Salmon and edgeR
0
gravatar for Tania
17 months ago by
Tania120
Tania120 wrote:

Hi Everyone

I used Salmon and edgeR. I have some DGE with pvalues very very very low so close to zero. Actually the same genes are differentially expressed with a pvalue small ~ 0.001 in cuffdiff but not close to zero, so I am just wondering why the significance values are so different. Like geneA has a pvalue 0.001 in cuffdiff and a very small pvalue close to zero in Salmon?

Is this weird ? or just because of the number of genes in the background between aligning to a genome and mapping to transcript?

Thanks

rna-seq • 666 views
ADD COMMENTlink modified 14 months ago by Biostar ♦♦ 20 • written 17 months ago by Tania120

Could you give more information about your input? Maybe even show the commands you're running?

ADD REPLYlink written 17 months ago by Hussain Ather910

Hi @Tania, are you trying to compare "wicked-fast transcript quantification" of Salmon with cuffdiff?

Also, you can search Trinity Group for probable same situation, as they use Salmon and edgR, too.

ADD REPLYlink written 17 months ago by Farbod3.2k

I am trying to comparing some gene expressions I got from Salmon (using FMD index) and edgeR and what I got from cuffdiff.

ADD REPLYlink written 17 months ago by Tania120

Hi Tania,

First possibility:

Cuffdiff would have performed it's differential expression comparisons on FPKM values; EdgeR would have performed it's differential expression comparisons on trimmed mean of M-values (TMM) (I hope that you have supplied raw counts (not FPKM counts) to edgeR?)

Second posibility:

Low sample numbers will produce very low P values.

ADD REPLYlink written 16 months ago by Kevin Blighe41k

Thank you. Yes, I supplied counts to edgeR not FPKM? so still not sure why?

ADD REPLYlink modified 16 months ago • written 16 months ago by Tania120

FPKM counts, which are normalised and used by Cuffdiff, are fundamentally different from the normalised counts used by edgeR. Evidence that has accumulated over the years implies that, with FPKM counts, many false-positive associations will be made through differential expression analysis. This does not fully explain why edgeR in your data is calling a lower P value though.

What are your sample numbers?

ADD REPLYlink written 16 months ago by Kevin Blighe41k

Thanks Kevin. You mean how many samples I have? I have 12 control vs 12 tumor. This is a sample result for example:

"IGFBPL1",7.14991080788358,5.342277022899,4.68668105170228e-82,2.98151026239127e-78
"TMPRSS6",8.97546800618013,6.99003709974313,4.91094391964278e-65,1.33893378151975e-61
ADD REPLYlink written 16 months ago by Tania120

Hi, that makes sense because low sample numbers will result in low and unreliable P and adjusted values, like these. Your number of false-positive associations is higher with lower samples, even after multiple testing correction. However, those that are most differentially expressed, you can have confidence that these are genuine results. It's the other ones of lesser statistical significance about which you need to be careful.

ADD REPLYlink written 16 months ago by Kevin Blighe41k

Got you Kevin, so what do you think is a good cutoff?

Should I use cutoff in edgeR as (0.01) for example, or even more stringent?

ADD REPLYlink written 16 months ago by Tania120

I would go as low as FDR adjusted P < 0.0001 and absolute log2 fold change > 2.

There is no real way to know the exact best cutoff. You may have to go back and forward with it for a while.

ADD REPLYlink written 16 months ago by Kevin Blighe41k
1

Thanks Kevin so much.

ADD REPLYlink written 16 months ago by Tania120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 996 users visited in the last hour