Question: Draw coverage plot from bed file
0
gravatar for ben.kunfang
7 months ago by
ben.kunfang10
ben.kunfang10 wrote:

Hi,

I was wondering if there is good way to draw coverage plot from bed file? My bed files are only contain chrome, start, end and its states, look like this:

chr1    958551  958621  state2
chr1    958681  958751  state2
chr1    961931  962011  state2
...

Now I would like to see the state distribution from 20k upstream from TSS to 10k downstream from TTS (include gene body). I was trying to use plotProfile function in deeptools but the result is always a straight line, I guess it is because the input I give is:

chr1    958551  958621  2
chr1    958681  958751  2
chr1    961931  962011  2
..

But I really don't have score for the bed file. Thanks for any suggestions!

bed file coverage deeptools • 296 views
ADD COMMENTlink modified 7 months ago by Alex Reynolds29k • written 7 months ago by ben.kunfang10
2
gravatar for Alex Reynolds
7 months ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

If states are categories, you could map them to bins with bedmap.

Bins (using 1k-sized bins to demonstrate the process):

$ fetchChromSizes hg38 \
    | awk -vOFS="\t" '{ print $1, "0", $2; } \
    | grep -vE '*_*' \
    | sort-bed - \
    | bedops --chop 1000 - \
    > hg38.1k.bed

Given a BED file of TSSs called tss.bed, you could get windows over the regions of interest via the following operations:

$ bedops -u --range -20000:10000 tss.bed | bedops --merge - | bedops --element-of 1 hg38.1k.bed - > hg38.1k.tssPad.bed

Say your categorical file is called states.bed. Given hg38.1k.tssPad.bed you could get the categorical labels via bedmap --echo-map-id:

$ bedmap --echo-map-id hg38.1k.tssPad.bed states.bed > states-over-bins.txt

Each line of states-over-bins.txt contains a semi-colon-delimited list of states that overlap that line's 1k bin.

This file can be brought into R for counting average state distribution for all states over 1k bins.

ADD COMMENTlink written 7 months ago by Alex Reynolds29k

I am not sure I do it completely right but the states-over-bins.txt file looks weird. There are many blank lines and some line with the category name, it looks like this:

2;2;2;2;2

2;2

And I notice that the number of lines of this bin.txt equals to tssPad.bed. I was wondering if you have any handy R script for process this? What I want is a plot histone_mark_distribution like this, instead of histone mark, it has state categories. Thanks!

ADD REPLYlink modified 7 months ago • written 7 months ago by ben.kunfang10

To clarify the output, use:

$ bedmap --echo --echo-map-id --delimiter '\t' hg38.1k.tssPad.bed states.bed > states-over-bins.bed

If you look at the output from that, it should be immediately clear.

It's up to you to write a script to visualize this. I'm just showing you how to get overlaps of states with bins.

ADD REPLYlink written 7 months ago by Alex Reynolds29k

It is much more clear, but I am afraid this is not exactly what I want. The length of each gene is difference. So I don't think this fixed bin size can be used when draw the distribution of gene body..

ADD REPLYlink written 7 months ago by ben.kunfang10

You might be able to use a multinomial to model the expected distribution of states over genes, and a hypergeometric to measure how much the state distribution deviates from expected. This is perhaps beyond the scope of this question, however.

ADD REPLYlink written 7 months ago by Alex Reynolds29k

True, but I think you answer might help others who want to draw coverage from bed file (focus on upstream of TSS or downstream of TTS, as those have exact same defined range). Anyway, Thanks for your answer!

ADD REPLYlink written 7 months ago by ben.kunfang10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1122 users visited in the last hour