Question

Merging Bed files with depth information

2

Entering edit mode

2.4 years ago

asdasd ▴ 20

Hi,

I was trying to merge a bed file which also has depth (I mean how many times was it seen on that bed file) information. I am only interested in one bed file not multiple files.

For example my bed file is something like this:

chr1 100 200
chr1 120 130
chr1 170 180
chr1 175 180

What I want is this:

chr1 100 120 1
chr1 120 130 2
chr1 130 150 1
chr1 170 175 2
chr1 175 180 3
chr1 180 200 1

The closest thing I could have found was bedops -p partition but it wont give depth of it. Do you guys know if there is a tool or flag that I could have used to achieve that?

Thanks

bed bedops bedtools • 1.4k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 2.4 years ago by asdasd ▴ 20

score 4 · Answer 1 · 2021-11-30

You can use bedtools genomecov. That requires a "genome" file which simply contains the length of each chromosome. Say your file is foo.bed

#/ get "genome" file by simply sorting the BED by chromosome and decreaasingly by end, then take the first entry per chromosome
cut -f1,3 foo.bed | sort -k1,1 -k2,2rn | awk -F "\t" '!_[$1]++' > genome.txt

#/ get coverages
bedtools genomecov -bg -g genome.txt -i foo.bed

chr1    100 120 1
chr1    120 130 2
chr1    130 170 1
chr1    170 175 2
chr1    175 180 3
chr1    180 200 1

If you want also uncovered intervals being reported use -bga:

bedtools genomecov -bga -g genome.txt -i foo.bed

chr1    0   100 0
chr1    100 120 1
chr1    120 130 2
chr1    130 170 1
chr1    170 175 2
chr1    175 180 3
chr1    180 200 1

Edit: Indeed, as Friederike says below, the genome file is simply a headerless tab-delimited file with two columns, the first being the chromosome name (or contig, so the first column of the BED file) and the second is the length of that respective chromosome/contig.

score 2 · Answer 2 · 2021-11-30

2

Entering edit mode

2.4 years ago

Alex Reynolds 35k

The simplest way to do this is to --count associations with the partition — just stream it in:

bedops --partition foo.bed | bedmap --echo --count --delim '\t' - foo.bed > answer.bed

This one-liner is self-contained and is agnostic about genome: this way, you don't need any other error-prone files containing chromosome sizes, etc. (good lord)