I have a time course study in which there are 4 time points and each time point has 4 biological replicates. My first aim was to find the DE gens during time.
Please consider it that I'm at a very basic level on analyzing RNA-seq data, that's why my questions seem very simple to you.
I have read somethings about heatmaps. As I know, heatmap is a good tool to analyze the data visually (since the data is high dimensional) and to cluster the genes. I also know that, the gene expression values need to be normalized to avoid systematic biases and there are some useful R packages to draw heatmaps.
What I don't know is that:
- Why do you cluster the genes in heatmaps at all?
- in my case where I have the normalized read counts of a time course RNA-seq study, with 4 biological replicates and 4 time points (4 conditions) how should I draw a heatmap? Should I draw the heatmaps for each time point (condition) separately? or should I draw all biological replicates and time points at the same time? which means 16 replicates in a row.
- The number of DEG is about 13000. How should I decide how to cluster them? Should I only draw heatmaps for DEGs or should I do it for all of the genes?
- I have done the normalization by methods such as TMM, Median, Quantile or Total. Is it essential to use FPKM or RPKM values?
- I also read that there is no best method for choosing distance measures in cluster analysis. Does it mean whatever method I choose would be up to me?
I have read some information on biostars which I can refer you to, but none of them could help me understand the answers to my questions:
and some other similar posts.
Thanks a lot.
I just read that I have to merge the table of my read counts and the table of FRD-adjusted P-values for each gene in one table, and then draw the heat maps for most significant genes