Question: Data for drawing Heatmaps (RNA-seq)
0
gravatar for sd.gamboa.t
6 months ago by
sd.gamboa.t0 wrote:

Hello,

Please I'd like some advice..

I performed a de novo assembly (of RNA-seq reads) of the transcriptome of my target organism by means of Trinity. Next, I followed the Trinity pipeline and scripts to get the following data matrices about the assembled genes:

  • FPKM
  • TPM
  • TMM

My question is: Which of these data (FPKM, TPM or TMM) should I use to perform a hierarchichal clustering of the genes and draw a heatmap?

I'd like to use TMM because it is a normalized value across samples (and the trinity scripts use TMM for clustering and heatmaps). However, I've seen in some papers that the FPKM values are used instead.

Also, which kind of normalization is better for drawing a heatmap? z-score or centered log2 transformation?

Thanks in advance.

Samuel

tmm rna-seq tpm heatmap fpkm • 685 views
ADD COMMENTlink modified 6 months ago by Corentin130 • written 6 months ago by sd.gamboa.t0
0
gravatar for Corentin
6 months ago by
Corentin130
Corentin130 wrote:

Hi,

The normalization should be performed by the tool you are using (the most popular being EdgeR, DESeq2 and limma), each one of them has a different way of normalizing the data, but if your data is robust (one of the important thing is having enough replicates), they should give similar results,

If you are using Trinity, there is a script called "run_DE_analysis.pl" which will perform the normalization (using EdgeR, DESeq or limma as you choose) and pairwise comparisons among each of your sample. To know how to run it you can just follow this trinity tutorial : https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Differential-Expression. As you can read on this page, this script is asking for a "matrix of raw read counts (not normalized!)". This tutorial explain every step (including drawing heatmaps).

Now, if you want information on how FPKM, RPKM and TPM work, I find this video useful (and by the way all the videos from StatQuest are good): https://www.youtube.com/watch?time_continue=608&v=TTUrtCY2k-w basically FPKM, RPKM and TPM normalize by library size (sequencing depth) and transcripts length, which should be enough if all your samples come from the same tissue.

I do not know a lot about TMM but as I understood it, it also adjusts for library composition. Meaning that it is useful if you want to compare different tissues, indeed if a gene is heavily expressed in one tissue and not the other, it will "absorb" most of the reads and the other genes will seems less expressed. Here is a video explaining how DESeq2 normalize data :

So in the end it depends on your experiment / data type.

Corentin

ADD COMMENTlink written 6 months ago by Corentin130

Hi Corentin,

Thanks for your response. I had followed the trinity instructions and scripts to perform differential expression analysis (using the gene counts matrix). The trinity scripts also provided a mean to automatically perform several analysis, including a heatmap where the TMM matrix of differential expressed genes is represented. Trinity scripts also provide a TPM matrix; and a FPKM matrix can be easily obtained from the RSEM output. However, I'd like to draw additional heatmaps for specific gene sets.

Trinity scripts help to draw a heatmap, which is based on mean-cetered-log2(TMM+1) values. I thought using this metric because i do comparisons among samples in my experiment design. However, in many papers they employ the FPKM values instead, others use CPM (count per millions), and so on, even when they compare among samples (as my case). Additionally, in some papers they use z-scores instead of log2 transformation.

By comparing heatmaps drawed with different metrics (TMM, TPM, or FPKM) and transformations (log2 or z-core) I got different heatmap coloring patterns and clusters. So my doubt still remains regarding if is it better to use (or more accepted by scientific community) any particular type of metric and transformation? or is one's choice which metric to present? just for the specific case of drawing and clustering gene sets in a heatmap.

Thanks again.

Samuel

ADD REPLYlink written 6 months ago by sd.gamboa.t0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1351 users visited in the last hour