What is the best way to filter out the top differentially expressed genes and depict them in pheatmap?
0
0
Entering edit mode
3 months ago
robeaumont • 0

Hi all,

I have performed DESeq2 analysis on my samples which consist of 5 biological replicates under control and cytokine-treated conditions. Of the 18435 genes run through DESeq2, 2517 meet the threshold of padj<0.05 and a log2 fold change >1 (increase/decrease).

My data matrix is called vst_heatmap_mat_filtered

> dim(vst_heatmap_mat_filtered)
[1] 2517   10


The code used to generate the heatmap and the heatmap itself are shown below

col = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(100)
group = data.frame(Condition = rep(c("Control", "IL-1β"), c(5,5)))
row.names(group) = colnames(vst_heatmap_mat_filtered)

Condition = c("navy", "darkgreen")
names(Condition) = c("Control", "IL-1β")
anno_colors = list(Condition = Condition)

pheatmap(vst_heatmap_mat_filtered, scale = 'row',
fontsize_row = 1,
fontsize_col = 8,
color = col,
annotation_col = group,
annotation_colors = anno_colors,
cluster_rows = T,
cutree_rows = 2,
cluster_cols = F)


I reduced the amount of genes to 30 of my most differentially expressed genes, which I selected based on the log2 fold change magnitude (as all had significant padj values). So top 15 upregulated and top 15 downregulated in response to the cytokine treatment.

> dim(filterTop15Downregulated)
[1] 15  6

> dim(filterTop15Upregulated)
[1] 15  6


I combined the two with rbind

> dim(Top30)
[1] 30  6


Then I performed the same sequential steps as with the original data.frame to end up with a vst matrix of 30 genes

> dim(Top30_vst)
[1] 30 10


And the heatmap looks like this

I didn't cluster by column as my treatment samples clustered to the left, so I only clustered by row.

I would like to follow this up with additional heatmaps of the top 50 and 100 most differentially expressed genes, but I'm not sure if this is the best/most efficient way to extract/depict the data. Any suggestions?

RNA-Seq pheatmap DEGs R • 381 views
0
Entering edit mode

The answer depends on what do you want to show and plot on the heatmap. For example, if you want to depict that the genes identified as differentially expressed after the DESeq testing, you should include the 2517. In this regard, I suggest you to ommit the names of the genes by setting show_rownames = F. On the other hand, if you want to increase the clarity of your heatmap, it is recomended to show a small proportion of the differentially expressed genes as you did it only showing 30 or 50. Additionally, if you are planing to perform downstream pathway analysis you would be interested in show the expression of genes from significant or relevant pathways.

Best regards