6.2 years ago by
Washington University, St Louis, USA
This is a good question and I look forward to reading anyone else's answer. My thought is that you certainly can do a PCA analysis and create heatmaps (presumably you mean with the typical hierarchical clustering performed) on raw read counts. But, you must interpret them within that context. If your libraries are of similar depth then maybe normalizing for read depth won't matter that much. And, if your PCA or heatmap/clustering analysis is mostly focused on the relationship between samples then normalizing for gene size won't matter as much. However, if libraries have dramatically different depths this could certainly affect your clustering results (although that will heavily depend on what kind of distance metric you use). Similarly, if you are interested in how genes relate to each other you probably will want to normalize for gene size. Calculating an RPKM matrix from your raw read counts is very easy. Why not run both (raw, RPKM, and maybe some other normalization schemes) through your heatmap and PCA analysis and compare the results with the above caveats in mind. It will probably be educational and teach you something about your data.