Question: RNAseq analysis Deseq
0
gravatar for rob.costa1234
13 months ago by
rob.costa1234160
United States
rob.costa1234160 wrote:

I performed RNA seq analysis using Deseq2 (three replicates, 4 grps). However in each two grp comparison FDRp<0.05 gives me > 10,000 genes. I can increase the p value to 0.01 but wonder do we get that many differentially expressed genes with Deseq. I would have expected 200-250 genes in each two grp analysis. How can we increase the stringency of Deseq2?

Thanks

Kanwar

rna-seq • 531 views
ADD COMMENTlink modified 13 months ago by seidel6.8k • written 13 months ago by rob.costa1234160
0
gravatar for h.mon
13 months ago by
h.mon24k
Brazil
h.mon24k wrote:

See this answer at the BioConductor Biostars forum, from Michael Love, one of DESeq2 developers. Basically, he suggests using thelfcThreshold parameter.

However, 10000 significant genes indeed does look a lot. How many genes were tested?

ADD COMMENTlink modified 13 months ago • written 13 months ago by h.mon24k
0
gravatar for seidel
13 months ago by
seidel6.8k
United States
seidel6.8k wrote:

I would have expected 200-250 genes in each two grp analysis.

Why? What do you know that you aren't describing about your data? I would say that many experiments can give large numbers of DE genes regardless of technique if that is simply the shape of your data. Perhaps you're comparing things which are very different from each other. Remember that DESeq and edgeR make assumptions about the data, such as most genes are not differentially expressed. If you happen to compare two things that are vastly different, you may get many genes with low P-values. You might ask yourself, in your 10k gene set, what is the smallest ratio of expression between your conditions? If you get something with a log2 fold change of 0.2 but a low p-value, would you believe that? (i.e. a change of 1.15 fold). Can you seriously detect a 15% difference in gene expression between two conditions? In these cases you could simply combine criteria, i.e. require a 2 fold-change (or whatever you would believe), and a low p-value, and tune it to get a number of genes you can reasonably pursue. Unless you are doing something wrong in the analysis, your experiment may simply be giving you a large number of genes, and you can simply rank them by p-value to get a top set, regardless of what that value is. There's nothing magic about 0.05, or 0.01 or 0.0001. Assuming a sound experiment, and reasonably clean data, your result is your result (except your initial statement of an expected number hints there's something else going on).

On the other hand, if you have an experiment that violates some of the normal assumptions about applying the negative binomial to your data (as DESeq does), and you have reason to tweak the model, there are parameters you can play with (such as dispersion).

tldr; it's not a matter of increasing stringency of the evaluation, simply use a lower p-value if you want fewer results.

ADD COMMENTlink written 13 months ago by seidel6.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour