Question: How to cluster the upregulated and downregulated genes in heatmap?
4
gravatar for bioinforesearchquestions
2.9 years ago by
United States
bioinforesearchquestions280 wrote:

How to cluster the upregulated and downregulated genes in heatmap?

Initial heatmap:

Initial

Expected heatmap

Expected

heatmap rna-seq • 3.5k views
ADD COMMENTlink modified 2.8 years ago by Kevin Blighe63k • written 2.9 years ago by bioinforesearchquestions280
5
gravatar for Kevin Blighe
2.9 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

You can try messing around with different combinations of the distance, linkage, and re-order functions. With the heatmap.2 function (assuming that you're using heatmap.2), you can specify the following as parameters:

#Re-order rows/columns by mean, use 1-Pearson's correlation distance, and complete linkage
heatmap.2(...,
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  distfun=function(x) as.dist(1-cor(t(x))),
  hclustfun=function(x) hclust(x, method="complete"))

#Re-order rows/columns by mean, use Euclidean distance, and Ward's linkage
heatmap.2(...,
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"))

Various other combinations exist, such as Manhattan and Canberra distance, coupled with simple or average linkage

Also experiment with setting your own breaks for heatmap shading, and scaling the data yourself to Z-scores (or other values)

myBreaks <- seq(-3, 3, length.out=101)
heat <- t(scale(t(MyDataMatrix)))
heatmap.2(..., breaks=myBreaks, scale="none")

If none of that works, as a last resort, you can order the rows yourself in whatever way you want, and then you 'fix' these in place by switching off the row dendrogram, but in this way you lose the dendrogram. Take a look at the parameters Rowv and dendrogram to see how you can do this. See here: https://www.rdocumentation.org/packages/gplots/versions/3.0.1/topics/heatmap.2

ADD COMMENTlink modified 22 months ago • written 2.9 years ago by Kevin Blighe63k

Thanks, Kevin. Sure, I will try them.

ADD REPLYlink written 2.9 years ago by bioinforesearchquestions280

Great - let me know how it goes!

ADD REPLYlink written 2.9 years ago by Kevin Blighe63k

Hi Kevin,

After incorpating the "1-Pearson's correlation distance",

Output

How do people generally show significant genes in heatmap more than 100. I have 620 significant genes (q-value <=0.05)

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by bioinforesearchquestions280
1

Looks great!

Yes, labeling is a major issue, but there are different ways of tackling it:

Modify cexRow and change the dimensions of the heatmap

cexRow controls the size of the labels, as you probably know, whilst modifying the dimensions of the heatmap could work whereby you elongate the heatmap. For example, try the following:

pdf("MyHeatmap.pdf", width=5, height=11)
     par(mar=c(2,2,2,2), cex=1.0)
     heatmap.2(..., cexRow=0.6)
dev.off()

Only include certain genes in the labels

Here you can use a vector as the rownames and only include certain key genes in it. For example, the vector could be:

myKeyGenes <- c("", "", "TP53", "", "", "", "BRCA1", ..., "geneX")

In heatmap.2, you then specify this with labRow=myKeyGenes. The order of the vector has to match the order of your data-matrix that is used for clustering. You can then use a normal-sized value for cexRow, as most of the labels are blank spaces.

Use a color-vector and switch off labelling

Here, you provide a color vector instead of labels and set it with RowSideColors in heatmap.2. For example, you could shade genes of a certain pathway in one color, or transcripts that are non-coding RNAs.

...of course, you can also use a combination of all of these.

ADD REPLYlink written 2.9 years ago by Kevin Blighe63k

Hi Kevin,

I have the excel file generated from Cuffdiff output for genes with the following columns

test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant

As per the excel file, sample_1 is Mutant and sample_2 is Wildtype. Log2(fold_change) is calculated as log2(sample_2/sample_1).

I thought it should be log2(final/initial), isn't it?

what is the difference between log2(Mutant/Wildtype) or log2(Wildtype/Mutant)?

ADD REPLYlink written 2.8 years ago by bioinforesearchquestions280

Hi friend, the difference is just in the interpretation.

If, for GeneX, Sample1's expression is 20 and Sample2's expression is 5, then:

log2(Sample1/Sample2) = 2

We can make the following statement: Sample1 has higher expression than Sample2 for GeneX

log2(Sample2/Sample1) = -2

We can make the following statement: Sample2 has lesser expression than Sample1 for GeneX

Both statements are implying the same thing. You can see, however, that the choice of nominator and denominator is important.

ADD REPLYlink written 2.8 years ago by Kevin Blighe63k

Hi Kevin,

I have a similar problem but I am not able to reorder my data as I have missing values in some columns, could you please take a look at my thread?

Thanks !

ADD REPLYlink written 2.2 years ago by eggrandio40

Done.

ADD REPLYlink written 22 months ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1503 users visited in the last hour