Unexpected results with bedtools merge.
1
0
Entering edit mode
9.3 years ago
Coryza ▴ 30

I am trying to merge a BED file containing all repeat-masked positions of a genome, however that gives me unexpected results.

For example I have a sorted unmerged BED file;

Nitab4.5_0000001        1       383
Nitab4.5_0000001        384     384
Nitab4.5_0000001        385     385
Nitab4.5_0000001        386     387
Nitab4.5_0000001        388     388
Nitab4.5_0000001        389     389
Nitab4.5_0000001        390     390
Nitab4.5_0000001        391     395
Nitab4.5_0000001        396     402
Nitab4.5_0000001        403     404

The merged BED file is;

Nitab4.5_0000001        1       395
Nitab4.5_0000001        396     402
Nitab4.5_0000001        403     404

I don't understand why this isn't a single feature? Furthermore I also get 0-based coordinates while these are not in the sorted BED file.

The sorted BED file;

Nitab4.5_0000003        1       1
Nitab4.5_0000003        2       2
Nitab4.5_0000003        3       4
Nitab4.5_0000003        5       9
Nitab4.5_0000003        10      11
Nitab4.5_0000003        12      16
Nitab4.5_0000003        17      24
Nitab4.5_0000003        25      28
Nitab4.5_0000003        29      73
Nitab4.5_0000003        74      90
......

The merged BED file;

Nitab4.5_0000003        0       4
Nitab4.5_0000003        5       9
Nitab4.5_0000003        10      11
Nitab4.5_0000003        12      16
Nitab4.5_0000003        17      24
Nitab4.5_0000003        25      28
Nitab4.5_0000003        29      73
Nitab4.5_0000003        74      90
Nitab4.5_0000003        91      213
Nitab4.5_0000003        214     221

Anyone who could help me with this?

bed merge bedtools • 1.8k views
ADD COMMENT
0
Entering edit mode
9.3 years ago

BED elements are usually 0-indexed, half-open, so these elements in your input are not strictly BED:

Nitab4.5_0000001        384     384
Nitab4.5_0000001        385     385
...
Nitab4.5_0000001        388     388
Nitab4.5_0000001        389     389
Nitab4.5_0000001        390     390

A half-open interval means up to but not including, and is sometimes written as [a, b), where a < b. Instead, you may want to use [a-1, a) to represent those elements, before doing set operations.

ADD COMMENT

Login before adding your answer.

Traffic: 2842 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6