Question: Selecting Deferentially Expressed Genes in RNASeq data analysis - DESEq2 and Cuffdiff
gravatar for Sreeraj Thamban
10 weeks ago by
Indian Institute of Science Education and Research
Sreeraj Thamban80 wrote:

Hi all, During Differential gene expression analysis of RNASeq data (DESEq2 or Cufdiff) which is best method to filter differentially expressed genes? Should I go with all the genes having adjusted P value < 0.05 or should I filter them based on a log2 Fold change cut-off?

Thank you

rna-seq deseq2 • 241 views
ADD COMMENTlink modified 10 weeks ago by Kevin Blighe7.3k • written 10 weeks ago by Sreeraj Thamban80

You can also try an integrated approach like metaseqR with more than one algorithms and combined p-values. In this way you don't have to struggle with comparisons as the method combines the "advantages" of many algorithms towards the optimization of precision-recall tradeoff. Disclaimer: I am the author of that package.

ADD REPLYlink written 10 weeks ago by Panagiotis Moulos20

Thank you Moulos, I will definitely give metaseqR a try.

ADD REPLYlink written 10 weeks ago by Sreeraj Thamban80
gravatar for Kevin Blighe
10 weeks ago by
Kevin Blighe7.3k
Republic of Ireland (Éire)
Kevin Blighe7.3k wrote:

Hi Sreeraj,

Traditionally, people usually start the filtering process at FDR Q<0.05 (i.e. 5% FDR or adjusted P<0.05) and also apply a log base 2 fold-change cut-off of absolute 2 (>|2|). The type of false discovery rate used is usually Benjamini-Hochberg. Bonferroni correction, which is very stringent, is not used as much. There are other types of correction for false discovery rate, too.

The use of both adjusted P value and log base 2 fold-change cut-off is important because many genes may have a highly significant P value but very low fold-change difference, or vice-versa. This is mainly due to differences in counts. For example, if we have gene1 and gene2 at mean expression levels of 12 and 6 across our cases and controls, respectively, are they significantly different? What if gene3 and gene4 had expression levels of 78 and 39? The P values in both cases may be different but the fold-changes would be the same. Other more complex scenarios can occur.

If you have used DESeq2, then I assume that your input to it was raw counts (not normalised counts from Cufflinks)? DESeq2 normalises raw counts 'quite well' and produces more credible statistics, from my experience. For one, DESeq2 deals quite well with the variability in counts that occurs from RNA-seq data.

Finally, if you get nothing significant from FDR Q<0.05 and log2 FC>|2|, you will have to consider relaxing to, initially, FDR Q<0.1, which is still somewhat acceptable. After that, I would consider relaxing the log2 FC to |1.5|, and so on.

Hope that this helps.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Kevin Blighe7.3k

Thank you Kevin this cleared my doubts. I have one more question, I took 1.3 as fold change cutoff to filter DEGs during the analysis, is 1.3 is an acceptable fold change? or should I increase?

Thank you

ADD REPLYlink written 10 weeks ago by Sreeraj Thamban80

Hey Sreeraj,

It's not great but, I mean, I used log2 1.3 in the past. On the linear scale it equates to ~2.5 fold-change difference, which still means that the expression is more than double. Coupled with a good adjusted P value, you can probably justify it. In certain tissues, like blood, getting large fold-change differences is difficult.

ADD REPLYlink written 10 weeks ago by Kevin Blighe7.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 691 users visited in the last hour