I'm dealing with data contains 47 tumor and 5 normal samples. Aim is to find upregulated genes in tumor. Before doing a differential analysis I made a clustering heatmap to check how well samples are clustered.
As I have simple counts (featureCounts) data, I transformed the data into vsd matrix using deseq2.
From vsd_matrix I took top 10% highly variable genes for visualization.
vars <- apply(vsd_matrix, 1, IQR) set <- vsd_matirx[vars > quantile(vars, 0.9), ]
With this I calculated z-score and plotted the data Clustering heatmap [In the heatmap annotation blue color is normal and brown is tumor]
From the heatmap I see that some of the tumor samples are not clustered well with other. Tumor samples are formed into two clusters.
I removed two normals which show some very bad library sizes for the further analysis.
When I did differential analysis on all those 47 tumor and 3 normal, among the differential expressed genes I see only 4 upregulated in tumor.
But when I did differential analysis (DEA) b/w 3 normal and 35 tumor samples which formed into a cluster, I found apprx 30 upregulated genes.
In the same way I did DEA b/w 3 normal and 12 tumor which formed into another cluster, I found around 60 upregulated.
Why different results with different analysis? Do I need to remove some tumor samples for DEA based on clustering?
Any help is appreciated