Question: Selecting Deferentially Expressed Genes in RNASeq data analysis - DESEq2 and Cuffdiff
gravatar for Sreeraj Thamban
9 months ago by
Indian Institute of Science Education and Research
Sreeraj Thamban90 wrote:

Hi all, During Differential gene expression analysis of RNASeq data (DESEq2 or Cufdiff) which is best method to filter differentially expressed genes? Should I go with all the genes having adjusted P value < 0.05 or should I filter them based on a log2 Fold change cut-off?

Thank you

rna-seq deseq2 • 573 views
ADD COMMENTlink modified 9 months ago by Kevin Blighe21k • written 9 months ago by Sreeraj Thamban90

You can also try an integrated approach like metaseqR with more than one algorithms and combined p-values. In this way you don't have to struggle with comparisons as the method combines the "advantages" of many algorithms towards the optimization of precision-recall tradeoff. Disclaimer: I am the author of that package.

ADD REPLYlink written 9 months ago by Panagiotis Moulos20

Thank you Moulos, I will definitely give metaseqR a try.

ADD REPLYlink written 9 months ago by Sreeraj Thamban90
gravatar for Kevin Blighe
9 months ago by
Kevin Blighe21k
University College London Cancer Institute
Kevin Blighe21k wrote:

Hi Sreeraj,

Traditionally, people usually start the filtering process at FDR Q<0.05 (i.e. 5% FDR or adjusted P<0.05) and also apply a log base 2 fold-change cut-off of absolute 2 (>|2|). The type of false discovery rate used is usually Benjamini-Hochberg. Bonferroni correction, which is very stringent, is not used as much. There are other types of correction for false discovery rate, too.

The use of both adjusted P value and log base 2 fold-change cut-off is important because many genes may have a highly significant P value but very low fold-change difference, or vice-versa. This is mainly due to differences in counts. For example, if we have gene1 and gene2 at mean expression levels of 12 and 6 across our cases and controls, respectively, are they significantly different? What if gene3 and gene4 had expression levels of 78 and 39? The P values in both cases may be different but the fold-changes would be the same. Other more complex scenarios can occur.

If you have used DESeq2, then I assume that your input to it was raw counts (not normalised counts from Cufflinks)? DESeq2 normalises raw counts 'quite well' and produces more credible statistics, from my experience. For one, DESeq2 deals quite well with the variability in counts that occurs from RNA-seq data.

Finally, if you get nothing significant from FDR Q<0.05 and log2 FC>|2|, you will have to consider relaxing to, initially, FDR Q<0.1, which is still somewhat acceptable. After that, I would consider relaxing the log2 FC to |1.5|, and so on.

Hope that this helps.

ADD COMMENTlink modified 9 months ago • written 9 months ago by Kevin Blighe21k

Thank you Kevin this cleared my doubts. I have one more question, I took 1.3 as fold change cutoff to filter DEGs during the analysis, is 1.3 is an acceptable fold change? or should I increase?

Thank you

ADD REPLYlink written 9 months ago by Sreeraj Thamban90

Hey Sreeraj,

It's not great but, I mean, I used log2 1.3 in the past. On the linear scale it equates to ~2.5 fold-change difference, which still means that the expression is more than double. Coupled with a good adjusted P value, you can probably justify it. In certain tissues, like blood, getting large fold-change differences is difficult.

ADD REPLYlink written 9 months ago by Kevin Blighe21k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 914 users visited in the last hour