Question

RNAseq analysis Deseq

0

Entering edit mode

6.1 years ago

rob.costa1234 ▴ 310

I performed RNA seq analysis using Deseq2 (three replicates, 4 grps). However in each two grp comparison FDRp<0.05 gives me > 10,000 genes. I can increase the p value to 0.01 but wonder do we get that many differentially expressed genes with Deseq. I would have expected 200-250 genes in each two grp analysis. How can we increase the stringency of Deseq2?

Thanks

Kanwar

RNA-Seq • 1.5k views

ADD COMMENT • link updated 6.1 years ago by seidel 11k • written 6.1 years ago by rob.costa1234 ▴ 310

score 0 · Answer 1 · 2018-03-04

0

Entering edit mode

6.1 years ago

h.mon 35k

See this answer at the BioConductor Biostars forum, from Michael Love, one of DESeq2 developers. Basically, he suggests using thelfcThreshold parameter.

However, 10000 significant genes indeed does look a lot. How many genes were tested?

ADD COMMENT • link 6.1 years ago by h.mon 35k

score 0 · Answer 2 · 2018-03-04

I would have expected 200-250 genes in each two grp analysis.

Why? What do you know that you aren't describing about your data? I would say that many experiments can give large numbers of DE genes regardless of technique if that is simply the shape of your data. Perhaps you're comparing things which are very different from each other. Remember that DESeq and edgeR make assumptions about the data, such as most genes are not differentially expressed. If you happen to compare two things that are vastly different, you may get many genes with low P-values. You might ask yourself, in your 10k gene set, what is the smallest ratio of expression between your conditions? If you get something with a log2 fold change of 0.2 but a low p-value, would you believe that? (i.e. a change of 1.15 fold). Can you seriously detect a 15% difference in gene expression between two conditions? In these cases you could simply combine criteria, i.e. require a 2 fold-change (or whatever you would believe), and a low p-value, and tune it to get a number of genes you can reasonably pursue. Unless you are doing something wrong in the analysis, your experiment may simply be giving you a large number of genes, and you can simply rank them by p-value to get a top set, regardless of what that value is. There's nothing magic about 0.05, or 0.01 or 0.0001. Assuming a sound experiment, and reasonably clean data, your result is your result (except your initial statement of an expected number hints there's something else going on).

On the other hand, if you have an experiment that violates some of the normal assumptions about applying the negative binomial to your data (as DESeq does), and you have reason to tweak the model, there are parameters you can play with (such as dispersion).

tldr; it's not a matter of increasing stringency of the evaluation, simply use a lower p-value if you want fewer results.