Question

Explanation of Venn diagram output

0

Entering edit mode

3.1 years ago

munaj86 ▴ 30

Hi,,

I need an explanation of a Venn diagram output. I have two peaks file resulted from MACS2, the first peak with 1318 peaks and the second one with 973 peaks. I converted the peaks files into Granges as follows:

bed_2 <- read.table("Concatenate_JARID2_peaks.narrowPeak_1.bed",header = TRUE, sep="\t",stringsAsFactors=FALSE, quote="")

peaks_2 <- toGRanges(bed_2, format="narrowPeak")

Then I make Venn diagram using the following command:

makeVennDiagram(list(peaks, peaks_2), NameOfPeaks=c("JARID2", "EZH2"),fill=c("#009E73", "#F0E442"), col=c("#D55E00", "#0072B2"),cat.col=c("#D55E00", "#0072B2"))

In the resulted Venn diagram, I got the total number for the first peak file as 682 and for the second one as 1023. The intersection peaks become as 289 so can anyone explain why I get more number in the second peak and less in the first one. Also, I would appreciate if anyone can check my commands and maybe share an alternative way to make Venn diagram for MACS2 peaks file.

Thanks..

MunaJ

general education • 1.8k views

ADD COMMENT • link updated 3.1 years ago by seidel 11k • written 3.1 years ago by munaj86 ▴ 30

score 2 · Accepted Answer · 2021-03-23

You should list what libraries your functions are coming from. I think you're using the makeVennDiagram function from the ChIPpeakAnno library. If so, you might play with the "connectedPeaks" parameter. One thing that can be confusing about counting overlaps of ChIP Seq peaks is when a "peak" consists of two segments in one data set, but a single segment in the other. A way around this is to step back conceptually, and consider creating a common universe of genomic locations from your data sets. This way when you ask if a data set overlaps a given location, you ask the same question from both data sets. You can do this by reducing your data sets to a common universe (a reference set) and then ask what components of each of your data sets overlap that:

 library(GenomicRanges)

# give things meaningful names
jarid2 <- peaks
ezh2 <- peaks2

# create a reference set of genomic locations
refset <- reduce(c(jarid2, ezh2))
# give them unique names
names(refset) <- paste0("p", seq_along(refset))

# tally the locations that overlap with your peaks
jarid2.ol <- names(refset[countOverlaps(refset, jarid2) > 0])
ezh2.ol <- names(refset[countOverlaps(refset, exh2) > 0])

# make a Venn diagram
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/vennDia.R")

qlist <- venndiagram(x=jarid2.ol, y=ezh2.ol, unique=T, title="JARID2 EZH2 overlap", labels=c("JARID2", "EZH2"), plot=TRUE, type=2)

# qlist is a list that contains the names for each Venn set