Merging Bed files with depth information
2
2
Entering edit mode
2.4 years ago
asdasd ▴ 20

Hi,

I was trying to merge a bed file which also has depth (I mean how many times was it seen on that bed file) information. I am only interested in one bed file not multiple files.

For example my bed file is something like this:

chr1 100 200
chr1 120 130
chr1 170 180
chr1 175 180

What I want is this:

chr1 100 120 1
chr1 120 130 2
chr1 130 150 1
chr1 170 175 2
chr1 175 180 3
chr1 180 200 1

The closest thing I could have found was bedops -p partition but it wont give depth of it. Do you guys know if there is a tool or flag that I could have used to achieve that?

Thanks

bed bedops bedtools • 1.4k views
ADD COMMENT
4
Entering edit mode
2.4 years ago
ATpoint 81k

You can use bedtools genomecov. That requires a "genome" file which simply contains the length of each chromosome. Say your file is foo.bed

#/ get "genome" file by simply sorting the BED by chromosome and decreaasingly by end, then take the first entry per chromosome
cut -f1,3 foo.bed | sort -k1,1 -k2,2rn | awk -F "\t" '!_[$1]++' > genome.txt

#/ get coverages
bedtools genomecov -bg -g genome.txt -i foo.bed

chr1    100 120 1
chr1    120 130 2
chr1    130 170 1
chr1    170 175 2
chr1    175 180 3
chr1    180 200 1

If you want also uncovered intervals being reported use -bga:

bedtools genomecov -bga -g genome.txt -i foo.bed

chr1    0   100 0
chr1    100 120 1
chr1    120 130 2
chr1    130 170 1
chr1    170 175 2
chr1    175 180 3
chr1    180 200 1

Edit: Indeed, as Friederike says below, the genome file is simply a headerless tab-delimited file with two columns, the first being the chromosome name (or contig, so the first column of the BED file) and the second is the length of that respective chromosome/contig.

ADD COMMENT
2
Entering edit mode

very clever!

just for the sake of future readers, the content of genome.txt is simply:

$ cat genome.txt
chr1    200
ADD REPLY
0
Entering edit mode

Thank you so much!

ADD REPLY
2
Entering edit mode
2.4 years ago

The simplest way to do this is to --count associations with the partition — just stream it in:

bedops --partition foo.bed | bedmap --echo --count --delim '\t' - foo.bed > answer.bed

This one-liner is self-contained and is agnostic about genome: this way, you don't need any other error-prone files containing chromosome sizes, etc. (good lord)

ADD COMMENT
0
Entering edit mode

Thank you so much that definitely makes sense!

ADD REPLY

Login before adding your answer.

Traffic: 2627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6