Question: Histogram showing overlap between 2 bed files
0
gravatar for newscient
13 days ago by
newscient0
European Union
newscient0 wrote:

I'm trying to plot the counts of the overlap of the regions of 1 bed file on the regions of another bed file (the second file has same width regions in different parts of the genome) in a histogram like plot in base pair resolution. But what ever I have tried (using bedtools/bedops - binning,counting and windows- e.g. this suggestion for motifs Which tool can plot the enrichment of specified motif across ChIP-seq bed files? ) doesn't seem to give me something informative. Is there an easy work around the problem? Thanks in advance

UPDATE: still not what I am looking for given in the answers!

Trying to explain it again: 2 bed files trying to get the counts of overlaps of one bed file to the other's file regions. Then plot the count frequency of overlaps (not sure if i should call that density) in a base-pair resolution on y-axis and bases (e.g. 0-300) depicting the regions from the second bed file on x-axis.

histogram overlap • 228 views
ADD COMMENTlink modified 7 hours ago • written 13 days ago by newscient0
1

Why not just obtain the per-base read-depth using BEDTools and then plot the profile in R Programming Language. Create a 'consensus' BED file first and, then, where the regions don't overlap, a value of 0 will be assigned.

In this way, you can also summarise each region by mean, as I've done here:

profile

ADD REPLYlink written 13 days ago by Kevin Blighe7.3k

Could you provide the R code for getting such a mean depth of coverage graph out of the bed file generated by bedtools -coverage?

ADD REPLYlink modified 13 days ago • written 13 days ago by newscient0
1

Sure thing. I would first merge your 2 BED files into one using BEDTools merge, then get the per-base read-depth in each sample over this 'consensus' BED and output to bedgraph format. Then run something like:

dfPerbaseDepth1 <- read.table("Sam1_PerBaseDepthBED.bedgraph", stringsAsFactors=FALSE, dec=".")
dfPerbaseDepth2 <- read.table("Sam1_PerBaseDepthBED.bedgraph", stringsAsFactors=FALSE, dec=".")
dfPerbaseDepth2[5] <- dfPerbaseDepth2[,5]/2

pdf("test.pdf", width=7, height=3)
    par(mar=c(2,2,2,2), cex=1.0)
    plot(dfPerbaseDepth1[,5], col="forestgreen", main="", type="l", xlab="", ylab="Per base read depth")
    lines(dfPerbaseDepth2[,5], col="royalblue")    
dev.off()

plot

  • NB - note that here I'm dividing the read-depth in my second file by 2 just for visualisation (it's the same sample that i'm plotting).
  • NB - you'll have to sort out the region labeling yourself - apologies
  • NB - you can summarise each BED region by mean with the BEDTools coverage tool to which I've linked above (just add -mean as a command-line parameter)
  • NB - to change the plot type, explore the type parameter to plot()
ADD REPLYlink modified 13 days ago • written 13 days ago by Kevin Blighe7.3k
0
gravatar for Alex Reynolds
12 days ago by
Alex Reynolds21k
Seattle, WA USA
Alex Reynolds21k wrote:

If you're working with hg38, you could do something like the following to get counts of reads from a BAM file in 50kb bins (requires UCSC Kent tools and BEDOPS):

$ fetchChromSizes hg38 | awk -v OFS="\t" '($0!~/.*_.*/){print $1, "0", $2; }' | sort-bed - > hg38.bed
$ bedmap --echo --count --delim '\t' <(bedops --chop 50000 hg38.bed) <(bam2bed < reads.bam) > hg38.50kbReadDensity.bed

You could adjust the width of this window by changing 50000 to some other value.

If you have two BED files, one of which that contains signal in the score column, you could instead do this:

$ bedmap --echo --count --delim '\t' <(bedops --chop 50000 hg38.bed) reads.bed > hg38.50kbReadDensity.bed

Either way, you could then split this file by chromosome:

$ for chr in `bedextract --list-chr hg38.50kbReadDensity.bed`; do echo ${chr}; bedextract ${chr} hg38.50kbReadDensity.bed > hg38.50kbReadDensity.${chr}.bed; done

Then plot the signal in each per-chromosome file to a per-chromosome PDF:

$ for chr in `bedextract --list-chr hg38.50kbReadDensity.bed`; do echo ${chr}; ./plotDensity.Rscript hg38.50kbReadDensity.${chr}.bed hg38.50kbReadDensity.${chr}.pdf "50kb Read Density - ${chr}"; done

The plotDensity.Rscript script could look like this:

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
inFn = args[1]
outFn = args[2]
title = args[3]
d <- read.table(inFn, col.names=c("chr", "start", "stop", "density"), sep="\t", colClasses=c("character", "numeric", "numeric", "numeric"))
xRange <- range(0, d$stop) 
yRange <- range(0, d$density)
pdf(outFn)
plot(xRange, yRange, type="n", xlab="Position", ylab="Density", main=title)
lines(d$start, d$density)
dev.off()
ADD COMMENTlink written 12 days ago by Alex Reynolds21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour