Question: Draw coverage plot from bed file
gravatar for ben.kunfang
13 months ago by
ben.kunfang10 wrote:


I was wondering if there is good way to draw coverage plot from bed file? My bed files are only contain chrome, start, end and its states, look like this:

chr1    958551  958621  state2
chr1    958681  958751  state2
chr1    961931  962011  state2

Now I would like to see the state distribution from 20k upstream from TSS to 10k downstream from TTS (include gene body). I was trying to use plotProfile function in deeptools but the result is always a straight line, I guess it is because the input I give is:

chr1    958551  958621  2
chr1    958681  958751  2
chr1    961931  962011  2

But I really don't have score for the bed file. Thanks for any suggestions!

bed file coverage deeptools • 428 views
ADD COMMENTlink modified 13 months ago by Alex Reynolds30k • written 13 months ago by ben.kunfang10
gravatar for Alex Reynolds
13 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If states are categories, you could map them to bins with bedmap.

Bins (using 1k-sized bins to demonstrate the process):

$ fetchChromSizes hg38 \
    | awk -vOFS="\t" '{ print $1, "0", $2; } \
    | grep -vE '*_*' \
    | sort-bed - \
    | bedops --chop 1000 - \
    > hg38.1k.bed

Given a BED file of TSSs called tss.bed, you could get windows over the regions of interest via the following operations:

$ bedops -u --range -20000:10000 tss.bed | bedops --merge - | bedops --element-of 1 hg38.1k.bed - > hg38.1k.tssPad.bed

Say your categorical file is called states.bed. Given hg38.1k.tssPad.bed you could get the categorical labels via bedmap --echo-map-id:

$ bedmap --echo-map-id hg38.1k.tssPad.bed states.bed > states-over-bins.txt

Each line of states-over-bins.txt contains a semi-colon-delimited list of states that overlap that line's 1k bin.

This file can be brought into R for counting average state distribution for all states over 1k bins.

ADD COMMENTlink written 13 months ago by Alex Reynolds30k

I am not sure I do it completely right but the states-over-bins.txt file looks weird. There are many blank lines and some line with the category name, it looks like this:



And I notice that the number of lines of this bin.txt equals to tssPad.bed. I was wondering if you have any handy R script for process this? What I want is a plot histone_mark_distribution like this, instead of histone mark, it has state categories. Thanks!

ADD REPLYlink modified 13 months ago • written 13 months ago by ben.kunfang10

To clarify the output, use:

$ bedmap --echo --echo-map-id --delimiter '\t' hg38.1k.tssPad.bed states.bed > states-over-bins.bed

If you look at the output from that, it should be immediately clear.

It's up to you to write a script to visualize this. I'm just showing you how to get overlaps of states with bins.

ADD REPLYlink written 13 months ago by Alex Reynolds30k

It is much more clear, but I am afraid this is not exactly what I want. The length of each gene is difference. So I don't think this fixed bin size can be used when draw the distribution of gene body..

ADD REPLYlink written 13 months ago by ben.kunfang10

You might be able to use a multinomial to model the expected distribution of states over genes, and a hypergeometric to measure how much the state distribution deviates from expected. This is perhaps beyond the scope of this question, however.

ADD REPLYlink written 13 months ago by Alex Reynolds30k

True, but I think you answer might help others who want to draw coverage from bed file (focus on upstream of TSS or downstream of TTS, as those have exact same defined range). Anyway, Thanks for your answer!

ADD REPLYlink written 13 months ago by ben.kunfang10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 786 users visited in the last hour