Hi Biostars Community,
I have searched the forum, but couldn't find a perfect answer for this.
I have downloaded some RNA-seq data from GEO (GSE112656). They are basically 11 osteoarthritis samples and 11 rheumatoid arthritis (so 22 in total).
I also have 4 samples of RNA-seq data generated from my lab with three technical replicates (so 12 total).
- Drug A - treated
- Drug A - control (untreated)
- Drug B - treated
- Drug B - control (untreated)
I want to use the GEO (GSE112656) and also my lab data to conduct PCA to see which ones are similar and cluster together and which ones are different.
How would I normalize and transform these datasets before following these directions:
I was considering following this post: Which counts to use for RNA-seq heatmap and PCA?
Basically, I am planning to combine all the samples (22 +12 = 32 samples) into one large data frame and then generate log2 transformed TMM followed by throwing that table into a PCA function to visualize the PC_1 by PC_2 table, or maybe PC_1 by PC_2 by PC_3 to seeing how they cluster/how similar or different they are to one another. I am planning TMM because HBC recommends this to compare between samples and within samples. Would all this be best practices?
Please, if you can suggest something better, I would appreciate it.