What is the best way to filter out the top differentially expressed genes and depict them in pheatmap?
Entering edit mode
2.2 years ago
robeaumont • 0

Hi all,

I have performed DESeq2 analysis on my samples which consist of 5 biological replicates under control and cytokine-treated conditions. Of the 18435 genes run through DESeq2, 2517 meet the threshold of padj<0.05 and a log2 fold change >1 (increase/decrease).

My data matrix is called vst_heatmap_mat_filtered

> dim(vst_heatmap_mat_filtered)
[1] 2517   10

The code used to generate the heatmap and the heatmap itself are shown below

col = colorRampPalette(rev(brewer.pal(n = 10, name = "RdYlBu")))(100)
group = data.frame(Condition = rep(c("Control", "IL-1β"), c(5,5))) 
row.names(group) = colnames(vst_heatmap_mat_filtered)

Condition = c("navy", "darkgreen")
names(Condition) = c("Control", "IL-1β")
anno_colors = list(Condition = Condition)

pheatmap(vst_heatmap_mat_filtered, scale = 'row', 
         fontsize_row = 1, 
         fontsize_col = 8, 
         color = col,
         annotation_col = group,
         annotation_colors = anno_colors,
         cluster_rows = T,
         cutree_rows = 2,
         cluster_cols = F)

enter image description here

I reduced the amount of genes to 30 of my most differentially expressed genes, which I selected based on the log2 fold change magnitude (as all had significant padj values). So top 15 upregulated and top 15 downregulated in response to the cytokine treatment.

> dim(filterTop15Downregulated)
[1] 15  6

> dim(filterTop15Upregulated)
[1] 15  6

I combined the two with rbind

> dim(Top30)
[1] 30  6

Then I performed the same sequential steps as with the original data.frame to end up with a vst matrix of 30 genes

> dim(Top30_vst)
[1] 30 10

And the heatmap looks like this

enter image description here

I didn't cluster by column as my treatment samples clustered to the left, so I only clustered by row.

I would like to follow this up with additional heatmaps of the top 50 and 100 most differentially expressed genes, but I'm not sure if this is the best/most efficient way to extract/depict the data. Any suggestions?

Thanks in advance!

RNA-Seq pheatmap DEGs R • 1.7k views
Entering edit mode

The answer depends on what do you want to show and plot on the heatmap. For example, if you want to depict that the genes identified as differentially expressed after the DESeq testing, you should include the 2517. In this regard, I suggest you to ommit the names of the genes by setting show_rownames = F. On the other hand, if you want to increase the clarity of your heatmap, it is recomended to show a small proportion of the differentially expressed genes as you did it only showing 30 or 50. Additionally, if you are planing to perform downstream pathway analysis you would be interested in show the expression of genes from significant or relevant pathways.

Best regards


Login before adding your answer.

Traffic: 2822 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6