Draw coverage plot from bed file
1
0
Entering edit mode
4.7 years ago
ben.kunfang ▴ 30

Hi,

I was wondering if there is good way to draw coverage plot from bed file? My bed files are only contain chrome, start, end and its states, look like this:

chr1    958551  958621  state2
chr1    958681  958751  state2
chr1    961931  962011  state2
...

Now I would like to see the state distribution from 20k upstream from TSS to 10k downstream from TTS (include gene body). I was trying to use plotProfile function in deeptools but the result is always a straight line, I guess it is because the input I give is:

chr1    958551  958621  2
chr1    958681  958751  2
chr1    961931  962011  2
..

But I really don't have score for the bed file.

Thanks for any suggestions!

bed deeptools coverage • 1.6k views
ADD COMMENT
2
Entering edit mode
4.7 years ago

If states are categories, you could map them to bins with bedmap.

Bins (using 1k-sized bins to demonstrate the process):

$ fetchChromSizes hg38 \
    | awk -vOFS="\t" '{ print $1, "0", $2; } \
    | grep -vE '*_*' \
    | sort-bed - \
    | bedops --chop 1000 - \
    > hg38.1k.bed

Given a BED file of TSSs called tss.bed, you could get windows over the regions of interest via the following operations:

$ bedops -u --range -20000:10000 tss.bed | bedops --merge - | bedops --element-of 1 hg38.1k.bed - > hg38.1k.tssPad.bed

Say your categorical file is called states.bed. Given hg38.1k.tssPad.bed you could get the categorical labels via bedmap --echo-map-id:

$ bedmap --echo-map-id hg38.1k.tssPad.bed states.bed > states-over-bins.txt

Each line of states-over-bins.txt contains a semi-colon-delimited list of states that overlap that line's 1k bin.

This file can be brought into R for counting average state distribution for all states over 1k bins.

ADD COMMENT
0
Entering edit mode

I am not sure I do it completely right but the states-over-bins.txt file looks weird. There are many blank lines and some line with the category name, it looks like this:

2;2;2;2;2

2;2

And I notice that the number of lines of this bin.txt equals to tssPad.bed. I was wondering if you have any handy R script for process this? What I want is a plot histone_mark_distribution like this, instead of histone mark, it has state categories. Thanks!

ADD REPLY
0
Entering edit mode

To clarify the output, use:

$ bedmap --echo --echo-map-id --delimiter '\t' hg38.1k.tssPad.bed states.bed > states-over-bins.bed

If you look at the output from that, it should be immediately clear.

It's up to you to write a script to visualize this. I'm just showing you how to get overlaps of states with bins.

ADD REPLY
0
Entering edit mode

It is much more clear, but I am afraid this is not exactly what I want. The length of each gene is difference. So I don't think this fixed bin size can be used when draw the distribution of gene body..

ADD REPLY
0
Entering edit mode

You might be able to use a multinomial to model the expected distribution of states over genes, and a hypergeometric to measure how much the state distribution deviates from expected. This is perhaps beyond the scope of this question, however.

ADD REPLY
0
Entering edit mode

True, but I think you answer might help others who want to draw coverage from bed file (focus on upstream of TSS or downstream of TTS, as those have exact same defined range). Anyway, Thanks for your answer!

ADD REPLY

Login before adding your answer.

Traffic: 1884 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6