deeptools plotHeatMap - Convert bed files to gene list?
1
0
Entering edit mode
2.0 years ago

Hello,

I am using plotHeatMap from deeptools to on ChIPseq data to create heatmaps and separate them based on k-means clustering.

This also gives me bed files with peaks based on cluster, and the head of the files look like this:

#chrom  start   end name    score   strand  thickStart  thickEnd    itemRGB blockCount  blockSizes  blockStart  deepTools_group
14  27039255    27102509    ENSMUST00000225146  .   +   27039255    27102509    0   1   63254   27039253    cluster_1
14  27039078    27089249    ENSMUST00000223942  .   +   27039078    27089249    0   1   50171   27039076    cluster_1

When trying to see what genes the names correspond to with a GTF file, they do not correspond to anything since they are away from the transcription start site (can be up to 1kb away). Is there a way I can easily add nearest gene to each row of this bed file?

I have tried the following with no success:

  • closest-features from bedops
  • bedtools closest
  • bedtools intersect my bed file and a GTF file

Thank you!

plotheatmap chip-seq deeptools • 1.3k views
ADD COMMENT
0
Entering edit mode

If you provide your closest-features and any other set operation commands, I'd be happy to help, if I can. Also, can you indicate if you are getting your GTF file of gene annotations from Ensembl or UCSC?

ADD REPLY
0
Entering edit mode

Thanks so much for offering! I am currently using the mm10 Ensembl GTF, and did the following steps

Convert gtf to bed with

gtf2bed < Mus_musculus.GRCm38.102.gtf > sorted-mm10.bed

Then, I sorted my bed files (after receiving an error that they weren't) with the normal unix sort command

sort hnf4a-ko-downreg-clusters.bed > hnf4a-ko-downreg-clusters-sort.bed

Finally, I used closest features with the two, but the output file had lots of formatting erorrs.

closest-features hnf4a-ko-downreg-clusters-sort.bed sorted-mm10.bed > hnf4a-ko-downreg-closest-features.bed
ADD REPLY
0
Entering edit mode
2.0 years ago

I might suggest limiting your search to genes:

gtf2bed < Mus_musculus.GRCm38.102.gtf | grep -w "gene" > sorted-mm10.genes.bed

But that's up to you. Otherwise, I think you may also get transcripts/exons, which may be more than you want. Again, up to you.

If hnf4a-ko-downreg-clusters.bed is the file containing peaks, as described in the original question, then you may want to do two things: 1) strip the header line starting with #, and 2) sort with sort-bed (part of the BEDOPS kit):

tail -n+2 hnf4a-ko-downreg-clusters.bed | sort-bed - > hnf4a-ko-downreg-clusters.sorted.bed

You can use sort -k1,1 -nk2,2 -nk3,3 to get the same result, but it will be slower than using sort-bed.

Then you should be able to use closest-features:

closest-features hnf4a-ko-downreg-clusters.sorted.bed sorted-mm10.genes.bed > answer.bed

You can add --dist to get distance in bases, along with the nearest element.

You can also add --delim <delimiter> replacing <delimiter> with something other than |, depending on what you are doing downstream.

See closest-features --help or the online documentation for a more complete description of options.

ADD COMMENT

Login before adding your answer.

Traffic: 2132 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6