Entering edit mode

6.5 years ago

Tania
▴
180

Hi Everyone

I used Salmon and edgeR. I have some DGE with pvalues very very very low so close to zero. Actually the same genes are differentially expressed with a pvalue small ~ 0.001 in cuffdiff but not close to zero, so I am just wondering why the significance values are so different. Like geneA has a pvalue 0.001 in cuffdiff and a very small pvalue close to zero in Salmon?

Is this weird ? or just because of the number of genes in the background between aligning to a genome and mapping to transcript?

Thanks

Could you give more information about your input? Maybe even show the commands you're running?

Hi @Tania, are you trying to compare "wicked-fast transcript quantification" of

Salmonwithcuffdiff?Also, you can search Trinity Group for probable same situation, as they use Salmon and edgR, too.

I am trying to comparing some gene expressions I got from Salmon (using FMD index) and edgeR and what I got from cuffdiff.

Hi Tania,

First possibility:Cuffdiff would have performed it's differential expression comparisons on FPKM values; EdgeR would have performed it's differential expression comparisons on trimmed mean of M-values (TMM) (I hope that you have supplied raw counts (not FPKM counts) to edgeR?)

Second posibility:Low sample numbers will produce very low P values.

Thank you. Yes, I supplied counts to edgeR not FPKM? so still not sure why?

FPKM counts, which are normalised and used by Cuffdiff, are fundamentally different from the normalised counts used by edgeR. Evidence that has accumulated over the years implies that, with FPKM counts, many false-positive associations will be made through differential expression analysis. This does not fully explain why edgeR in your data is calling a lower P value though.

What are your sample numbers?

Thanks Kevin. You mean how many samples I have? I have 12 control vs 12 tumor. This is a sample result for example:

Hi, that makes sense because low sample numbers will result in low and unreliable P and adjusted values, like these. Your number of false-positive associations is higher with lower samples, even after multiple testing correction. However, those that are most differentially expressed, you can have confidence that these are genuine results. It's the other ones of lesser statistical significance about which you need to be careful.

Got you Kevin, so what do you think is a good cutoff?

Should I use cutoff in edgeR as (0.01) for example, or even more stringent?

I would go as low as FDR adjusted P < 0.0001 and absolute log2 fold change > 2.

There is no real way to know the exact best cutoff. You may have to go back and forward with it for a while.

Thanks Kevin so much.