How To Plot/Visualise Tfbs Positions Relative To Tss For Thousands Of Genes?
5
5
Entering edit mode
11.5 years ago
Ian 6.0k

I have the genome coordinates for different transcription factor binding sites (TFBS) and i want to be able to display them relative to their closest gene transcription start site (TSS), e.g. within 10kb. I imagine it might look something like a heatmap.

I was thinking of tinkering with SeqMiner, but does anyone have any specific solutions?

UPDATE: The reason i thought of a heatmap was so i could see if there is a pattern in TFBS distance to TSS (up/downstream)

visualization transcription binding • 7.1k views
1
Entering edit mode

Why a heat map? You only have 1 dimension (a vector with the distances between each TFBS and its nearest TSS). I would just put these distances into "bins" ([-10kb,-9kb],[-9kb,-8kb],...,[-1kb,0],[0,+1kb]) and make a barplot (distance VS Number_of_TFBS)

0
Entering edit mode

Why a heat map? You only have 1 dimension (a vector with the distances of TSSs to their nearest TSS).

5
Entering edit mode
11.5 years ago

I would create a distribution plot. On the x-axis is the distance from the TSS. On the y-axis is number of TFBS found within or at that distance. You may want to bin the distances from the TSS, say every 50 bp.

It may also be useful or helpful if you take a subset of genes - say all those with a certain GO or pathway annotation. You might find that for certain TFBS there is a tendency that the binding site is closer to or farther from the TSS. A distribution plot would show that nicely.

0
Entering edit mode

Hmmm - you have a good point. I have been messing with heatmaps lately, so have them on the mind. The other thing is if there is a second or third TFBS then kmeans clusters might highlight interesting patterns of distribution.

5
Entering edit mode
11.5 years ago
Ryan Dale 5.0k

A specific solution to @Larry_Parnell's histogram, assuming you have pybedtools and matplotlib:

from matplotlib import pyplot as plt
from pybedtools import BedTool

x = BedTool('genes.gff')\
.closest('tfbs.bed', D='a', id=True)\
.filter(lambda x: x[-1] != -1)\
.saveas('closest-tfbs')

plt.hist([int(i[-1]) for i in x])
plt.show()


Assuming you have GO annotations in genes.gff as an "Ontology_term" attribute, you could easily subset the histogram by GO terms:

def GO_filter(feature, term):
return term in feature['Ontology_term'].split(',')

plt.hist([int(i[-1]) for i in x.filter(GO_filter, 'GO:0003777')])


And once you have the data in closest-tfbs, you can play around with making a heatmap to see if it reveals any additional information -- say, by filling a large matrix of (10k x genes) with a 1 where a TFBS was found (though you might want to use a binning strategy to conserve memory and time for clustering).

1
Entering edit mode
11.5 years ago

If you are trying to say something quantitative you should display them in a conventional barplot - with another facet dedicated to randomly selected cohort of TFBS.

0
Entering edit mode
11.5 years ago
razor ▴ 190

Some density graph maybe, with the relative distance from the TSS on the x axis, and the relative amount of various TFBSs on the y axis. http://had.co.nz/ggplot2/stat_density.html

0
Entering edit mode
9.5 years ago

I think you can try using INSECT's server. For the visualization, I think a simple histogram on the distance from each hit to the TSS would be more appropiated than a heatmap.