I currently have a heatmap (which is k-means clustered) of the top 1000 most variable genes, but I want to be able to narrow it down to about 100-200. The reason why it's set so high is that when I want to analyse each cluster, I end up filtering out any genes that are not considered significantly differentially expressed genes (e.g. padj < 0.05, log2foldchange < 0 or > 0). However when I do this, I get left with a small amount of genes which isn't ideal for when I'm doing GO-term analysis - hence why I start off with a larger number (like 1000), so that even after filtering there's enough genes in the cluster for me to analyse.
I was wondering if there is a way to filter out the genes beforehand, so that all the genes in the heatmap are already significantly expressed and I won't have to filter them out afterwards - that is, all the genes in the heatmap will actually be used for downstream analyses. Hope that makes sense! Also open to any recommendations and suggestions on how to improve my approach.
Here is my code:
#heatmap topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 1000) #clustered heatmap set.seed(1234) k <- pheatmap(assay(rld)[topVarGenes,], scale="row",kmeans_k = 4) clusterDF <- as.data.frame(factor(k$kmeans$cluster)) colnames(clusterDF) <- "Cluster" OrderByCluster <- assay(rld)[topVarGenes,][order(clusterDF$Cluster),] pheatmap(OrderByCluster, scale="row",annotation_row = clusterDF, show_rownames = FALSE,cluster_rows = FALSE) acute_hi <- rownames(clusterDF[clusterDF$Cluster == 1,,drop=FALSE]) #DEGs resSigind = res[ which(res$padj < 0.05 & res$log2FoldChange > 0), ] resSigrep = res[ which(res$padj < 0.05 & res$log2FoldChange < 0), ] resSig = rbind(resSigind, resSigrep) #filtering out any genes from the heatmap which are not significantly different (e.g. do not overlap with DEGs) acutehi_cluster <- resSig[acute_hi,]