Question: RNAseq analysis Deseq
gravatar for rob.costa1234
22 months ago by
United States
rob.costa1234190 wrote:

I performed RNA seq analysis using Deseq2 (three replicates, 4 grps). However in each two grp comparison FDRp<0.05 gives me > 10,000 genes. I can increase the p value to 0.01 but wonder do we get that many differentially expressed genes with Deseq. I would have expected 200-250 genes in each two grp analysis. How can we increase the stringency of Deseq2?



rna-seq • 737 views
ADD COMMENTlink modified 22 months ago by seidel6.9k • written 22 months ago by rob.costa1234190
gravatar for h.mon
22 months ago by
h.mon29k wrote:

See this answer at the BioConductor Biostars forum, from Michael Love, one of DESeq2 developers. Basically, he suggests using thelfcThreshold parameter.

However, 10000 significant genes indeed does look a lot. How many genes were tested?

ADD COMMENTlink modified 22 months ago • written 22 months ago by h.mon29k
gravatar for seidel
22 months ago by
United States
seidel6.9k wrote:

I would have expected 200-250 genes in each two grp analysis.

Why? What do you know that you aren't describing about your data? I would say that many experiments can give large numbers of DE genes regardless of technique if that is simply the shape of your data. Perhaps you're comparing things which are very different from each other. Remember that DESeq and edgeR make assumptions about the data, such as most genes are not differentially expressed. If you happen to compare two things that are vastly different, you may get many genes with low P-values. You might ask yourself, in your 10k gene set, what is the smallest ratio of expression between your conditions? If you get something with a log2 fold change of 0.2 but a low p-value, would you believe that? (i.e. a change of 1.15 fold). Can you seriously detect a 15% difference in gene expression between two conditions? In these cases you could simply combine criteria, i.e. require a 2 fold-change (or whatever you would believe), and a low p-value, and tune it to get a number of genes you can reasonably pursue. Unless you are doing something wrong in the analysis, your experiment may simply be giving you a large number of genes, and you can simply rank them by p-value to get a top set, regardless of what that value is. There's nothing magic about 0.05, or 0.01 or 0.0001. Assuming a sound experiment, and reasonably clean data, your result is your result (except your initial statement of an expected number hints there's something else going on).

On the other hand, if you have an experiment that violates some of the normal assumptions about applying the negative binomial to your data (as DESeq does), and you have reason to tweak the model, there are parameters you can play with (such as dispersion).

tldr; it's not a matter of increasing stringency of the evaluation, simply use a lower p-value if you want fewer results.

ADD COMMENTlink written 22 months ago by seidel6.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 814 users visited in the last hour