Question: Looking For A Way To Count Total Length Of Regions Of Bed File
6
gravatar for Yunfei Li
7.5 years ago by
Yunfei Li310
ThermoFisher Scientific
Yunfei Li310 wrote:

as title, there is no overlap region in bed file. Thanks

bed • 17k views
ADD COMMENTlink modified 14 months ago by demolidd7740 • written 7.5 years ago by Yunfei Li310

Can you post the head of your bed file, most of them, has a start and end co-ordinate, so you could just subtract start from end and sum everything!!

ADD REPLYlink written 7.5 years ago by Sukhdeep Singh10k

Do your BED data contain overlapping regions, or are your regions disjoint? If the latter, or if you don't care if regions overlap, then a basic awk statement as shown in one answer will suffice. Otherwise, let us know and I'll suggest another method that accounts for both cases.

ADD REPLYlink written 7.5 years ago by Alex Reynolds30k
27
gravatar for Raygozak
7.5 years ago by
Raygozak1.3k
State College, PA, Penn State
Raygozak1.3k wrote:

You can do it with the following command line:

cat file.bed | awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'
ADD COMMENTlink modified 7.5 years ago by Istvan Albert ♦♦ 84k • written 7.5 years ago by Raygozak1.3k

Actually, I think it would be $3-$2 + 1. If you have an interval 5 to 10, subtracting counts the nucleotides 6 to 10, but misses out the first one. Otherwise looks good!

ADD REPLYlink written 7.5 years ago by Jelena Aleksic910
6

Actually the bed files are 0-based for the positions with the end not included (see UCSC website http://www.genome.ucsc.edu/FAQ/FAQformat.html#format1). Hence, a bed with positions: 5-10 means positions 6-10 on the genome. So no need to add 1.

ADD REPLYlink written 7.5 years ago by Anthony Mathelier890
2

Indeed, one reason to use 0-based indexing is to minimize calculations on a computer: the arithmetic is simpler.

ADD REPLYlink modified 6.3 years ago • written 7.5 years ago by Alex Reynolds30k

Huh. Well, that was helpful to find out - thanks!

ADD REPLYlink written 7.5 years ago by Jelena Aleksic910

I am surprised bedtools does not have a command for that. Or does it?

ADD REPLYlink written 6.6 years ago by 141341254653464453.5k
2

Given sorted input BED files A, B, C, etc., an input BED file that defines bounds of chromosomes for the organism, e.g. hg19.extents.bed (link) and BEDOPS 2.4.1 (or greater), you could do something like the following:

bedops --merge A B C ... | bedmap --echo --echo-map-size hg19.extents.bed - > answer.bed

If you just have one BED file, then the following will merge overlapping regions in one file, so as to calculate unique base length:

bedops --merge A | bedmap --echo --echo-map-size hg19.extents.bed - > answer.bed

See docs for merging and mapping for more detail.

Or you can pipe merged data into the aforementioned awk statement:

bedops --merge A | awk ...

But the bedops | bedmap pipeline preserves chromosome names and extent data.

ADD REPLYlink modified 6.3 years ago • written 6.6 years ago by Alex Reynolds30k
0
gravatar for demolidd77
14 months ago by
demolidd7740
demolidd7740 wrote:

For a BED file , you can cacalculate its total size to use

awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}' file.bed .
It works correctly .

ADD COMMENTlink written 14 months ago by demolidd7740
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1168 users visited in the last hour