Question: Constructing An Heatmap Of "Distance Of Binding Region Relative To Tss"
1
gravatar for Dataminer
7.0 years ago by
Dataminer2.6k
Netherlands
Dataminer2.6k wrote:

Hi!

I have ChIP-seq profile for a transcription factor. I want to construct a heat map in which I can view the distance between peaks and TSS. What I have done: Annotated the peaks (bounded genomic regions by TF's) to nearest TSS. This means I have the co-ordinate of the TSS from the nearest peak.

I need guidiance in constructing heatmap for my TF's using the coordinates of nearest TSS and the coordinates in my peak file or the raw .BED file(from which peaks were called).

A small example script in python or in R is welcomed.

Thank you for your time.

Best

next-gen chip-seq • 4.7k views
ADD COMMENTlink modified 7.0 years ago by Leonor Palmeira3.7k • written 7.0 years ago by Dataminer2.6k

How do you plan to get the extra dimension for heatmap? Wouldn't it just be a histogram chunked by distance?

ADD REPLYlink written 7.0 years ago by brentp23k

Hi brent, I was expecting your comment. Actually I saw a heatmap in few articles depicting the same "Examination of transcriptional network reveals an important role for TCFAP2C, SMARCA4, and EOMES in trophoblast stem cell maintenance"-Benjamin L. Kidder. I am very curious to know how these people do it? But anyway, you tell me what is the best way to do this and how it can be done? You can also look at this link http://genome.cshlp.org/content/21/2/245/F2.expansion.html Thank you

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Dataminer2.6k

which figure exactly? and what are the axes?

ADD REPLYlink written 7.0 years ago by brentp23k

please have a look at this figure http://genome.cshlp.org/content/21/2/245/F2.expansion.html here they have used multiple TFs

ADD REPLYlink written 7.0 years ago by Dataminer2.6k
4
gravatar for Duff
7.0 years ago by
Duff660
United Kingdom
Duff660 wrote:

Hi Dataminer

I recently created a similar figure (not quite the same but the code should be adaptable I think) after using clover for TFBS enrichment analysis for a group of regulated genes. My code uses ggplot2 in R and plots each interaction (TF (y axis) - gene (x axis)) as a square with the number of hits for that TF in the promoter of the gene shown by the colour of the square.

You should be able to adapt the code to show distance from TSS for each gene pretty easily (just put different numbers in the relevant column which was Hits in my data)

The plot:

p3 <- ggplot() + geom_point(data = tfHits, aes(symbol, TF, colour = Hits), shape=15, size = 4)

p3 <- p3 + scale_colour_gradient(low = "cornflowerblue", high = "firebrick") + opts(panel.background = theme_blank(), legend.position = "right", axis.title.x = theme_blank(), axis.title.y = theme_blank(), axis.text.x = theme_text(angle = 90, hjust=1, size=6), axis.text.y = theme_text(colour = "black"), axis.ticks = theme_blank())

The data - a dataframe with 3 columns: TF in first, gene (symbol) in second and distance to TSS in third. I would show an excerpt of my data but I can't work out how to get a 'table' into the text here - hey ho.

You can do a similar plot in ggplot2 with the 'tiles' geom:

p2 <- ggplot(tfHits, aes(TF, symbol)) + geom_tile(aes(fill=Hits))

p2 <- p2 + scale_fill_gradient2(name='Hits', low="#0571B0", mid="#F7F7F7", high="#CA0020", midpoint=20, trans="identity") 

p2 <- p2 + labs(x = "TF", y = "Gene") + opts(axis.ticks = theme_blank(), axis.text.x = theme_text(size = 10, angle = 90, hjust = 1, colour = "grey25"), axis.text.y = theme_text(size=5, colour = 'gray25'))

Personally I prefer the squares. Of course this won't do any kind of clustering - I don't know if that's important to you but you could reorder the dataframe passed in by some dendrogram order etc etc.

HTH

duff

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Duff660

Thank you, I will try to adapt the script. :)

ADD REPLYlink written 7.0 years ago by Dataminer2.6k
4
gravatar for Leonor Palmeira
7.0 years ago by
Leonor Palmeira3.7k
Li├Ęge, Belgium
Leonor Palmeira3.7k wrote:

You might consider using the ade4 package to plot something like this, in R:

library(ade4)
example(table.value)

enter image description here

I have used it and customized it to add a heatmap feature to the 'size proportional to value' feature, so let me know if you would like me to share part of this code with you:

enter image description here

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Leonor Palmeira3.7k
3
gravatar for Sukhdeep Singh
7.0 years ago by
Sukhdeep Singh9.8k
Netherlands
Sukhdeep Singh9.8k wrote:

Hey, to help you a bit on that.

Step1 -> Sort your list (distance of peak from TSS) in small bins may be like 100-200 bp and check the count how many lie in them.

Step2 -> Make a dataframe of name of proteins in one column, and following columns will be distance from TSS on the basis of bins you made.

Step3 -> Plot fancier heat map like using ggplot2 and geom_tile.

I putting some custom code to proceed from the point you have data frame.

# Generate dataframe
df=data.frame(c(paste('A',seq(1,20,by=1))),seq(1,200,by=10),seq(1,100,by=5),seq(1,60,by=3),seq(1,40,by=2))

# load some libraries
library('ggplot2')
library('rescale')
library('scales')

# rename columns 
colnames(df)[1]='Proteins'
colnames(df)[1]='1KB'
colnames(df)[3]='2KB'
colnames(df)[4]='-1KB'
colnames(df)[5]='-2KB'
# melt the df
df.m=melt(df)

# add a rescale column which gives intensity ratios to the distance column on the basis of your min and max value
df.m=ddply(df.m,.(variable),transform,rescale=rescale(value))

# finally plot
ggplot(df.m,aes(variable,Proteins))+geom_tile(aes(fill=rescale),colour='white')

Result

enter image description here

You can extend it and sort the axis as well.

Have fun

Sukhi

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Sukhdeep Singh9.8k
2
gravatar for Vikas Bansal
7.0 years ago by
Vikas Bansal2.3k
Berlin, Germany
Vikas Bansal2.3k wrote:

Hi, I think as Brent suggested you can draw a histogram. But after looking at the image from Kidder's paper, I got some idea. So according to that image, you can make a dataframe , say, with 5 TF's and 3 rows which are going to be 6kb upstream, on TSS and 2.5 kb downstream (you can change according to your preference) or make say 10 rows - 1k upstream, 2k upstream...so on. Now just put the frequencies in that dataframe. Eg if there are 100 regions which are present at the distance up to 1k upstream for TF A, then put this value (100) in TF A vs 1k upstream. So now you have dataframe with frequencines for each TF and you can use heatmap function in R to make a heatmap (change colors, size etc acc. to your preference).

Sorry for not providing the script as I am not in the lab otherwise I would have tried.

ADD COMMENTlink written 7.0 years ago by Vikas Bansal2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 954 users visited in the last hour