Question

Forum:reporting differentially expressed genes in a global RNA-seq study

0

Entering edit mode

4.2 years ago

thomas.welch ▴ 50

Hi all,

I have been wondering this for a while and cannot find a real consensus in the literature about what seems to me to be a relatively simple question regarding proper reporting of RNA-seq results.

Let's say I am doing a global transcriptomics study of a plant, exposing it to several treatments (including a control), the transcriptional changes induced by which i am generally interested in. What is the proper way to report those changes?

Reporting general results such as how many significantly differentially expressed genes there are in each treatment and GO enrichment analyses, seem to me to be straight forward. They are straightforward because we are only interested in one type of result here, significance; is a GO term significantly enriched? is a gene significantly differentially expressed compared to control?

However, when it comes to reporting results for individual genes of interest, for example in a heatmap (as is common practice), this seems not so straightforward to me, as now we are interested on two things, significance of differential expression and the direction of that expression (up vs down-regulation). The statistics that are typically reported for these results are of course P-value and Log2 fold change respectively. So, lets say i have now done my hypothetical global RNA-seq study, and i have narrowed my results down to an interesting cohort of genes, now which of these two statistics do i use to make my pretty heatmap?

I've seen in the literature both statistics used to make such heatmaps, and no real reason expressed as to why one or the other was preferred. If you ask me, i would simply filter my genes to only those which show differential expression P-value < 0.05 in at least one treatment, and then make my heatmap out of log 2 fold change values. Resulting in a heatmap looking something like the image attached (with rows being genes and columns being treatments). However, I wonder if perhaps i should also be reporting level of significance in some way, rather than only filtering for it?

I hope to hear a diverse array of opinions.

Tom

potential heatmap

DESeq2 RNA-seq expression heatmaps transcriptomics • 1.1k views

ADD COMMENT • link 4.2 years ago by thomas.welch ▴ 50

1

Entering edit mode

I have never seen a heatmap on the pvalue (do you have a link ? ), nor do I think it is generally an appropriate approach since effect size is more important than statistical confidence once you have filtered out potential false positives. So your procedure is ok IMHO. A possible visual improvement would be to make the color encoding symmetrical (the +4 fold change color as deep as the -4 fold change) for instance by capping fold change higher than 4 to 4.

ADD REPLY • link 4.2 years ago by Carlo Yague 9.0k

1

Entering edit mode

Thank you for your reply. Perhaps I should've been clearer regarding reporting of significance, come to think of it it seems i may very well have not seen a heatmap showing P-value, although I have seen it reported in tables and as log10 P-value in volcano plots.

Volcano plots of course provide a solution in being able to report both statistical confidence and direction of expression, but cannot show which gene is which, or provide a visual way by which differences in patterns of expression can be shown to the reader.

With regard to reporting effect size, an alternative to log 2 fold change I have seen used to create heatmaps is z-score as calculated from FPKM values. However, the way I read it is calculated here (https://www.researchgate.net/post/How-can-I-calculate-z-score-from-rpkm-or-counts-values) it compares the expression value of a gene in a sample to the mean value for that gene across all samples, which i think is inappropriate for a study in which one has a dedicated control treatment.

ADD REPLY • link 4.2 years ago by thomas.welch ▴ 50