Hi everyone! I'm working on a gene expression file from an RNA seq experiment and try to compare between two groups of samples. I'm pretty new to this world of gene expressions and need some help. I have a strong reason to believe that there should be a biological difference between the two groups (from the biological aspect). When I do different gene expression I get many differentially expressed genes and when I cluster the samples according to the genses, I don't see two clear separation between the groups. Is there a way to filter the genes I get from my differential expression algorithm so I can see a better clustering effect? I am interested in selecting genes from the list of differentially expressed genes that will separate the two groups in the best way, however, there are so many genes that are differentially expressed that I don't know how to effectively do it. Also, I have been trying two different differential gene expression methods (ttest and limma) and the genes I get from each limma don't appear in the gene list I get from the ttest. Which one should I use? Thanks a lot!
Just some things to look out for from my own perspective:
- is your data too flat? - low variance data will never be capable of being separated. How does a simple boxplot of your data look?
- What P value cut-offs are you using? Always aim to use adjusted P of 0.05 (i.e. 5% FDR) and absolute log base 2 fold change >2
- When you cluster, which distance metric and linkage function are you using? - you may see greatest separation using Euclidean distance and Ward's or Complete linkage. Mess around with different combinations of these. Take a look at the thread here for other options including Pearson correlation distance: A: How to plot a heatmap with two different distance matrices for X and Y LDA and PCA mentioned by the others are good suggestions that you can also try.
Finally, I'd just like to add that clustering and heatmaps should not be the deciding factors on how good your segregation has worked. I would ask: How well does your panel of genes predict your outcome of interest or group assignment through logistic regression modelling? Through that coupled with ROC analysis, you can derive AUC, sensitivity, specificity, precision, and accuracy.
Hope that this helps! Kevin
Thank you all very much! Interestingly, when I do pca with the DE genes I can see a pretty nice separation between my two groups. For some reason though, in the heatmap the samples don't cluster as nicely. I used Euclidean and weighted for building the heatmap. My data is actually genetically heterogeneous. I have few driving translocation that each have a different gene expression signature and I am trying to find a difference between two groups that each group contains samples with different types of translocations. That is, I am trying to find the difference between groups where there is a lot of genetic variability within each group to begin with.