Forum: How to convincingly illustrate and discuss negative results in NGS.
gravatar for Carlo Yague
2.8 years ago by
Carlo Yague4.7k
Carlo Yague4.7k wrote:

Hi everyone,

In statistics and in science in general, it is always harder to convincingly show lack of effect rather than significant differences. In low throughput experiments, one can always report pvalues and show that the difference is not statistically significant but the situation is more complex (at least to me) in genome-wide studies. For instance, in the case of RNA-seq expression data, even if there is no biological differences between two conditions, there will always be some genes significantly differentially expressed because thousands of genes are tested. In such a case, one can not just say "we didn't see any significant differences between condition X and Y".

How would you illustrate and discuss such a case ? Do you have any example of publications that address this issue ?

Here are some ideas :

  • Discuss that there is less DEG between the conditions X and Y than between X and Z (where there is an effect that has been biologically confirmed). However I find this a bit weak.
  • Discuss that there is obviously no global differences between X and Y beside some differences that might be anecdotical.
  • Show MA-plot/volcano plot and let the reader decide for himself.



ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by Carlo Yague4.7k

A possible (but maybe also not strongly convincing) way would be to perform GO/KEGG enrichment and conclude that there are "no meaningful" differentially expressed genes.

ADD REPLYlink written 2.8 years ago by WouterDeCoster41k

Thank you for your input. Yes, that could be an interesting point in some cases. However sometimes the genes are deregulated based on their "genomic features" (position on chromosomes, presence of introns, nearby ncRNAs, ...) rather than their biological function and there is no functional link (GO/KEGG) between the DEG, even if there is a true effect.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Carlo Yague4.7k

Use a more stringent multiple-testing adjustment so no tests are significant after adjustment. Just kidding, don't do that. :)

ADD REPLYlink written 2.8 years ago by spvensko200

Yep, I thought the same thing ^^

More seriously, let's assume that we can't change the stringency because we want to keep the same parameters across the full study (that includes more conditions than X and Y).

ADD REPLYlink written 2.8 years ago by Carlo Yague4.7k

If there is no difference in X & Y (and if that has no bearing/influence on the conclusion(s) of the study) why not report the fact as is?

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by genomax73k

Perhaps you could compare condition X vs X (e.g., use 6 biological replicates for 3 vs 3 comparison) to demonstrate that variation plus a large number of genes invariably identifies some genes with differential expression. Or you could compare mixed (XY vs XY) or even randomly sampled data to make the same point.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

If you start off with the null hypothesis "I will not detect a difference in mRNA expression between these two animals", then multiple-testing adjustments is probably the least of your worries.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by John12k

replicate using orthogonal assays, like qPCR for RNA-Seq. There are much larger issues with RNA-Seq than just this.

ADD REPLYlink written 2.8 years ago by nwon40
gravatar for mikhail.shugay
2.8 years ago by
Czech Republic, Brno, CEITEC
mikhail.shugay3.3k wrote:

I think the most straightforward way is the Volcano plot. Regarding the multiple testing, you can try to illustrate the fact that there are no significant differences by plotting the distribution of P-values and computing the false discovery rate (e.g. with this package).

The other idea is to get more samples from Gene Expression Omnibus (GEO) or Sequencing Read Archive (SRA). You can then normalize the RNA-Seq data from your study together with other published datasets, for example using Cuffnorm. The idea is to show that while you do not observe any differences between your conditions, there is still a meaningful difference between your sample and previously published ones, for example up-regulation of known tissue-specific genes. (I.e. a sort of positive control.)

PS. The biological background of the experiment should play a huge role here: in some setups the biologist would expect to find just a handful of genes to be differentially expressed.

PPS. Increasing the number of replicas and performing gene set enrichment analysis can also help to clarify the things a bit. The number of differentially expressed genes chosen under some arbitrary cutoff is not the best measure to quantify global differences between samples.

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by mikhail.shugay3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1876 users visited in the last hour