alpha <- 0.01 #set the cutoff value

Question

heatmap GSEA software

1

Entering edit mode

3.7 years ago

Rob ▴ 170

Hello Does anyone know that what is the package or basis that GSEA software uses for plotting heatmap?

RNA-Seq • 2.2k views

ADD COMMENT • link 3.7 years ago by Rob ▴ 170

0

Entering edit mode

You have to give the expression values (normalized reads values obtained from DESeq2, EdgeR etc) as input to GSEA for plotting heatmap. Based on the expression values for each gene in all samples, it provides different shades of color from red to blue for low to high expression respectively.

ADD REPLY • link 3.7 years ago by Tm ★ 1.1k

0

Entering edit mode

Hello Thank you. Should I give only differentially expressed gene to GSEA or all genes? Also, I saw that GSEA heatmap is only for 100 top genes. How can I do this for all of my genes or all of differentially expressed genes? What do you mean by "normalized rread count from edgeR or DESeq2"? Is that RSEM data or log transformed data? Is that possible to use this software for heatmap and change the setting for heatmap?

ADD REPLY • link 3.7 years ago by Rob ▴ 170

0

Entering edit mode

Why to use all genes for heatmap? it will not remain informative in that case. Thus, usually it is a practice to plot heatmap with following:

1) Considering 50-100 most significant differential expressed genes based on p-value or q-value/p-adjusted values 2) Considering top most up-regulated (25-50 genes) and down-regulated genes (25-50 genes) based on logfold change

If you are familiar with R, then you can use pheatmap package where you can customize the parameters to suit your need and if not you can try online webserver heatmapper or Clustvis for plotting heatmap.

How can I do this for all of my genes or all of differentially expressed genes? What do you mean by "normalized rread count from edgeR or DESeq2"? Is that RSEM data or log transformed data?

To ensure samples are comparable, read mapped count obtained for each gene/feature needs to be normalized before differential expression analysis. So, if you have a list of diferentially expressed genes, then you must also have their normalized expression values, either in form of basemean, FPKM/RPKM or TPM etc.

Can you tell more about the method you have used for getting differential expressed genes?

ADD REPLY • link 3.7 years ago by Tm ★ 1.1k

0

Entering edit mode

Hello, Thanks

I used edgeR and DESeq2 packages in R for differential expression analysis. I think the result with DESeq2 is more reliable. What is your idea?

ADD REPLY • link 3.7 years ago by Rob ▴ 170

0

Entering edit mode

It will not be fair to say DESeq2 is more reliable. It totally depends upon the type of samples and number of replicates you are working with. This paper can give you more insight.

ADD REPLY • link 3.6 years ago by Tm ★ 1.1k

0

Entering edit mode

Thank you so much. I have RSEM data, raw count data, HT-Seq data, I do not know which one should I use. My supervisor asked me to use HT-Seq but with both edgeR and DESeq2 I did not get any sig FDR value for 20000 genes.(I rounded HT-Seq before importing to R to be compatible with DESeq2 and edgeR). with raw data and RSEM, I get some results sig diff exp genes but no very good heatmaps. He also told not use TPM or RPKM data I dont know why. I have two grups each has 22 patients. overall 44 sample. What is your ide about the datasets? Why my heatmaps did not have clear pattern while I have genes diff expressed?

this is the code I used for diff expression and heatmap: I tried with and without z-score and log-transformation. got no good pattern.

Reading in raw data

rdata <- read.table("mydata.txt", header = TRUE, row.names = 1)

library(pheatmap) library(DESeq2)

Differential abundance

alpha <- 0.01 #set the cutoff value

Create metadata

sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 22), rep("1", 22))) colnames(sample_org) <- c("Group")

dds <- DESeqDataSetFromMatrix(countData = rdata, colData = sample_org, design = ~Group)

dd <- DESeq(dds) res <- results(dd)

subset only significant genes

sig <- res[res$padj < alpha,] sig_genes <- rownames(sig) subset <- rdata[sig_genes,]

log transform data for visualization

tdata <- log2(subset + 0.5) mat <- as(rdata, "matrix")

row Z-score

m_tr_z_score <- t(scale(t(mat)))

Set colours

my_colour = list( Group = c("0" = "blue", "1" = "yellow"))

Plot

pheatmap(symbreaks = FALSE,cluster_cols = FALSE, cluster_rows = TRUE,color = colorRampPalette(c("#f71616","#f71616","white", "#1919d4", "#1919d4"))(100),annotation_col = sample_org, annotation_colors = my_colour, mat, scale = "row")

in this line of code i get warning message of converting data to factor but I dont think it has effect on results: dds <- DESeqDataSetFromMatrix(countData = rdata, colData = sample_org, design = ~Group)

ADD REPLY • link 3.6 years ago by Rob ▴ 170