Question: Visualization of Homer mergePeaks output Venn?
2
gravatar for morovatunc
3.6 years ago by
morovatunc400
Turkey
morovatunc400 wrote:

Hi,

I am trying to visualise my overlapped chip-seq peak regions which I analysed with Homer mergePeaks function. I have got one venn info file and a "result" file. I would like to use that venn info file then visualise it but when I looked for visualisation libraries or programs, I did not find a method which merits my expectations.

the primary problem is my data is big. ( relatively :) ). I have 19 datasets in one conditions group and 9 datasets in healthy one. I have read making venn diagram for more than 3 datasets would not be smart on biostar tread.

I am trying to find overlapped regions of transcriptional factors that why I wanna know which transcriptional factors sites are most common.

I am a python coding and R mediocre.

please dont post me ( Venn/Euler Diagram Of Four Or More Sets and Draw Diagrams For Intersection Between Many Sets threads I have already read them 0192308 times), also if you think homer is not the best tool for finding the overlaps, please feel free to advice others. ( Yes, I do know monkseq)

Thank you very much for your help.

Best regards,

Tunc

chip-seq venn/euler homer overlap • 3.1k views
ADD COMMENTlink modified 3.1 years ago by Ryan Dale4.8k • written 3.6 years ago by morovatunc400
4
gravatar for steve
3.1 years ago by
steve2.0k
United States
steve2.0k wrote:

EDIT: I made a script that can parse the venn.txt output of HOMER mergePeaks for comparisons of 2 to 5 peak files (bed files) and automatically create Venn Diagrams. All you need to do is pass it a sample ID (e.g. "ABC") and the venn.txt file output by HOMER, it will create the plot in the same directory as the venn.txt file. This uses R and the VennDiagram package. Script is located here: https://github.com/stevekm/Bioinformatics/blob/master/HOMER_mergePeaks_multiVenn/multi_peaks_Venn.R

EDIT2: I also posted an implementation of this with Upset plots, which allows for >5 comparison categories, here: Visualization of ChIP-Seq peak overlaps using HOMER mergePeaks and UpSetR

EDIT3: scripts have been moved here


In my experience it is easier to just count the number of entries (lines) in each of the bed files output by HOMER mergePeaks and pass these values to R for plotting, instead of trying to parse the venn.txt file. This is the script I am using for this purpose (including the bash mergePeaks commands). You should be able to easily modify it to add more entries

#!/bin/bash
# BED files with the peaks to overlap
tmp_outH3K4ME3="peaks_H3K4ME3.bed"
tmp_outH3K27AC="peaks_H3K27AC.bed"
# a sample ID
tmp_sampleID="ABC"

# HOMER mergePeaks
mergePeaks "$tmp_outH3K4ME3" "$tmp_outH3K27AC" -prefix mergepeaks -venn mergepeaks_venn

# the mergePeaks file outputs names:
tmp_mergeH3K4ME3="mergepeaks_${tmp_outH3K4ME3}"
tmp_mergeH3K27AC="mergepeaks_${tmp_outH3K27AC}"

# count the number of unique peaks
num_H3K4ME3=$(tail -n +2 $tmp_mergeH3K4ME3 | wc -l)
echo "num_H3K4ME3 is $num_H3K4ME3"
num_H3K27AC=$(tail -n +2 $tmp_mergeH3K27AC | wc -l)
echo "num_H3K27AC is $num_H3K27AC"

# count the number of peaks in common
num_overlap=$(tail -n +2 "mergepeaks_${tmp_outH3K4ME3}_${tmp_outH3K27AC}" | wc -l)

# plot the values in a pairwise venn in R
# # make sure the correct version of R is loaded:
module unload r
module load r/3.2.0
Rscript --slave --no-save --no-restore - "$tmp_sampleID" "$num_H3K4ME3" "$num_H3K27AC" "$num_overlap" <<EOF
  ## R code
  # load packages
  library('VennDiagram')
  library('gridExtra')
  # get script args, print them to console
  args <- commandArgs(TRUE); cat("Script args are:\n"); args
  SampleID<-args[1]
  peaks_H3K4ME3<-as.numeric(args[2])
  peaks_H3K27AC<-as.numeric(args[3])
  peaks_overlap<-as.numeric(args[4])
  # get filename for the plot PDF
  plot_filename<-paste0(SampleID,"_peaks.pdf") 
  # make a Venn object, don't print it yet
  venn<-draw.pairwise.venn(area1=peaks_H3K4ME3+peaks_overlap,area2=peaks_H3K27AC+peaks_overlap,cross.area=peaks_overlap,category=c('H3K4ME3','H3K27AC'),fill=c('red','blue'),alpha=c(0.3,0.3),cex=c(2,2,2),cat.cex=c(1.25,1.25),main=SampleID,ind=FALSE)
  # print it inside a PDF file, with a title
  pdf(plot_filename,width = 8,height = 8)
  grid.arrange(gTree(children=venn), top=SampleID) #, bottom="subtitle")
  dev.off()
EOF
ADD COMMENTlink modified 19 months ago • written 3.1 years ago by steve2.0k
3
gravatar for Ryan Dale
3.1 years ago by
Ryan Dale4.8k
Bethesda, MD
Ryan Dale4.8k wrote:

Another alternative is a "binary heatmap". It scales better than a Venn diagram, though combinatorial binding of 19 factors is going to be complex no matter what way you look at it.

Here's a working example:

ADD COMMENTlink written 3.1 years ago by Ryan Dale4.8k
1
gravatar for Sinji
3.1 years ago by
Sinji2.8k
UT Southwestern Medical Center
Sinji2.8k wrote:

Making a VennDiagram for 19 datasets is probably a very bad idea, but it seems like you already know that. I do now know of any decent visualization for so many datasets, but you may be interested in an UpSet plot. As for recommendations on finding overlaps, I recommend using bedtools intersect. I've never worked with the HOMER merge so I can't really compare them, but i've always used bedtools for this type of analysis.

ADD COMMENTlink written 3.1 years ago by Sinji2.8k

Sinji;

Thank you for the reply. I have experienced both of the programs and I can easily say that if you are comparing a lot of beds, HOMER is way better. Its output consisted of variety of information such as unique peaks which occur in single samples to peaks that occur in all the samples.

Give it a try.

Best,

Tunc.

ADD REPLYlink written 3.1 years ago by morovatunc400
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1193 users visited in the last hour