11 months ago by

USA / Europe / Brazil

Hi Sreeraj,

Traditionally, people usually start the filtering process at `FDR Q<0.05`

(i.e. `5% FDR`

or `adjusted P<0.05`

) and also apply a `log base 2 fold-change cut-off of absolute 2`

(>|2|). The type of false discovery rate used is usually Benjamini-Hochberg. Bonferroni correction, which is very stringent, is not used as much. There are other types of correction for false discovery rate, too.

The use of both adjusted P value and log base 2 fold-change cut-off is important because many genes may have a highly significant P value but very low fold-change difference, or vice-versa. This is mainly due to differences in counts. For example, if we have gene1 and gene2 at mean expression levels of 12 and 6 across our cases and controls, respectively, are they significantly different? What if gene3 and gene4 had expression levels of 78 and 39? The P values in both cases may be different but the fold-changes would be the same. Other more complex scenarios can occur.

If you have used DESeq2, then I assume that your input to it was raw counts (not normalised counts from Cufflinks)? DESeq2 normalises raw counts 'quite well' and produces more credible statistics, from my experience. For one, DESeq2 deals quite well with the variability in counts that occurs from RNA-seq data.

Finally, if you get nothing significant from FDR Q<0.05 and log2 FC>|2|, you will have to consider relaxing to, initially, FDR Q<0.1, which is still somewhat acceptable. After that, I would consider relaxing the log2 FC to |1.5|, and so on.

Hope that this helps.

You can also try an integrated approach like metaseqR with more than one algorithms and combined p-values. In this way you don't have to struggle with comparisons as the method combines the "advantages" of many algorithms towards the optimization of precision-recall tradeoff. Disclaimer: I am the author of that package.

20Thank you Moulos, I will definitely give metaseqR a try.

90