Question: Histogram showing overlap between 2 bed files
1
gravatar for newscient
13 months ago by
newscient10
European Union
newscient10 wrote:

I'm trying to plot the counts of the overlap of the regions of 1 bed file on the regions of another bed file (the second file has same width regions in different parts of the genome) in a histogram like plot in base pair resolution. But what ever I have tried (using bedtools/bedops - binning,counting and windows- e.g. this suggestion for motifs Which tool can plot the enrichment of specified motif across ChIP-seq bed files? ) doesn't seem to give me something informative. Is there an easy work around the problem? Thanks in advance

UPDATE: still not what I am looking for given in the answers!

Trying to explain it again: 2 bed files trying to get the counts of overlaps of one bed file to the other's file regions. Then plot the count frequency of overlaps (not sure if i should call that density) in a base-pair resolution on y-axis and bases (e.g. 0-300) depicting the regions from the second bed file on x-axis.

histogram overlap • 865 views
ADD COMMENTlink modified 12 months ago • written 13 months ago by newscient10
1

Why not just obtain the per-base read-depth using BEDTools and then plot the profile in R Programming Language. Create a 'consensus' BED file first and, then, where the regions don't overlap, a value of 0 will be assigned.

In this way, you can also summarise each region by mean, as I've done here:

profile

ADD REPLYlink written 13 months ago by Kevin Blighe33k

Could you provide the R code for getting such a mean depth of coverage graph out of the bed file generated by bedtools -coverage?

ADD REPLYlink modified 13 months ago • written 13 months ago by newscient10
2

Sure thing. I would first merge your 2 BED files into one using BEDTools merge, then get the per-base read-depth in each sample over this 'consensus' BED and output to bedgraph format. Then run something like:

dfPerbaseDepth1 <- read.table("Sam1_PerBaseDepthBED.bedgraph", stringsAsFactors=FALSE, dec=".")
dfPerbaseDepth2 <- read.table("Sam1_PerBaseDepthBED.bedgraph", stringsAsFactors=FALSE, dec=".")
dfPerbaseDepth2[5] <- dfPerbaseDepth2[,5]/2

pdf("test.pdf", width=7, height=3)
    par(mar=c(2,2,2,2), cex=1.0)
    plot(dfPerbaseDepth1[,5], col="forestgreen", main="", type="l", xlab="", ylab="Per base read depth")
    lines(dfPerbaseDepth2[,5], col="royalblue")    
dev.off()

plot

  • NB - note that here I'm dividing the read-depth in my second file by 2 just for visualisation (it's the same sample that i'm plotting).
  • NB - you'll have to sort out the region labeling yourself - apologies
  • NB - you can summarise each BED region by mean with the BEDTools coverage tool to which I've linked above (just add -mean as a command-line parameter)
  • NB - to change the plot type, explore the type parameter to plot()
ADD REPLYlink modified 13 months ago • written 13 months ago by Kevin Blighe33k
0
gravatar for Alex Reynolds
13 months ago by
Alex Reynolds26k
Seattle, WA USA
Alex Reynolds26k wrote:

If you're working with hg38, you could do something like the following to get counts of reads from a BAM file in 50kb bins (requires UCSC Kent tools and BEDOPS):

$ fetchChromSizes hg38 | awk -v OFS="\t" '($0!~/.*_.*/){print $1, "0", $2; }' | sort-bed - > hg38.bed
$ bedmap --echo --count --delim '\t' <(bedops --chop 50000 hg38.bed) <(bam2bed < reads.bam) > hg38.50kbReadDensity.bed

You could adjust the width of this window by changing 50000 to some other value.

If you have two BED files, one of which that contains signal in the score column, you could instead do this:

$ bedmap --echo --count --delim '\t' <(bedops --chop 50000 hg38.bed) reads.bed > hg38.50kbReadDensity.bed

Either way, you could then split this file by chromosome:

$ for chr in `bedextract --list-chr hg38.50kbReadDensity.bed`; do echo ${chr}; bedextract ${chr} hg38.50kbReadDensity.bed > hg38.50kbReadDensity.${chr}.bed; done

Then plot the signal in each per-chromosome file to a per-chromosome PDF:

$ for chr in `bedextract --list-chr hg38.50kbReadDensity.bed`; do echo ${chr}; ./plotDensity.Rscript hg38.50kbReadDensity.${chr}.bed hg38.50kbReadDensity.${chr}.pdf "50kb Read Density - ${chr}"; done

The plotDensity.Rscript script could look like this:

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)
inFn = args[1]
outFn = args[2]
title = args[3]
d <- read.table(inFn, col.names=c("chr", "start", "stop", "density"), sep="\t", colClasses=c("character", "numeric", "numeric", "numeric"))
xRange <- range(0, d$stop) 
yRange <- range(0, d$density)
pdf(outFn)
plot(xRange, yRange, type="n", xlab="Position", ylab="Density", main=title)
lines(d$start, d$density)
dev.off()
ADD COMMENTlink written 13 months ago by Alex Reynolds26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1777 users visited in the last hour