deeptools plotHeatMap - Convert bed files to gene list?
Hello,

I am using plotHeatMap from deeptools to on ChIPseq data to create heatmaps and separate them based on k-means clustering.

This also gives me bed files with peaks based on cluster, and the head of the files look like this:

#chrom  start   end name    score   strand  thickStart  thickEnd    itemRGB blockCount  blockSizes  blockStart  deepTools_group
14  27039255    27102509    ENSMUST00000225146  .   +   27039255    27102509    0   1   63254   27039253    cluster_1
14  27039078    27089249    ENSMUST00000223942  .   +   27039078    27089249    0   1   50171   27039076    cluster_1


When trying to see what genes the names correspond to with a GTF file, they do not correspond to anything since they are away from the transcription start site (can be up to 1kb away). Is there a way I can easily add nearest gene to each row of this bed file?

I have tried the following with no success:

• closest-features from bedops
• bedtools closest
• bedtools intersect my bed file and a GTF file

Thank you!

biology gene deeptools plotheatmap chipseq
If you provide your closest-features and any other set operation commands, I'd be happy to help, if I can. Also, can you indicate if you are getting your GTF file of gene annotations from Ensembl or UCSC?

Thanks so much for offering! I am currently using the mm10 Ensembl GTF, and did the following steps

Convert gtf to bed with

gtf2bed < Mus_musculus.GRCm38.102.gtf > sorted-mm10.bed


Then, I sorted my bed files (after receiving an error that they weren't) with the normal unix sort command

sort hnf4a-ko-downreg-clusters.bed > hnf4a-ko-downreg-clusters-sort.bed


Finally, I used closest features with the two, but the output file had lots of formatting erorrs.

closest-features hnf4a-ko-downreg-clusters-sort.bed sorted-mm10.bed > hnf4a-ko-downreg-closest-features.bed

I might suggest limiting your search to genes:

gtf2bed < Mus_musculus.GRCm38.102.gtf | grep -w "gene" > sorted-mm10.genes.bed


But that's up to you. Otherwise, I think you may also get transcripts/exons, which may be more than you want. Again, up to you.

If hnf4a-ko-downreg-clusters.bed is the file containing peaks, as described in the original question, then you may want to do two things: 1) strip the header line starting with #, and 2) sort with sort-bed (part of the BEDOPS kit):

tail -n+2 hnf4a-ko-downreg-clusters.bed | sort-bed - > hnf4a-ko-downreg-clusters.sorted.bed


You can use sort -k1,1 -nk2,2 -nk3,3 to get the same result, but it will be slower than using sort-bed.

Then you should be able to use closest-features:

closest-features hnf4a-ko-downreg-clusters.sorted.bed sorted-mm10.genes.bed > answer.bed


You can add --dist to get distance in bases, along with the nearest element.

You can also add --delim <delimiter> replacing <delimiter> with something other than |, depending on what you are doing downstream.

See closest-features --help or the online documentation for a more complete description of options.