Question: Is It Possible To Filter Only Bookend Reads From A Bed File?
0
gravatar for Duarte Molha
5.5 years ago by
Duarte Molha200
Oxford, UK
Duarte Molha200 wrote:

I have a bed file with many fragments, some overlapping, some on their own and some adjacent to each other (book-ended) features.

I know can group overlapping and book-ended features using bedtools like

bedtools cluster -i fragments.bed

However I was wondering if anyone knew of a way of obtaining from the input file only the fragments that contain book-ended adjacent fragments.

Any ideas?

Best regards

bedtools scripting filtering • 2.1k views
ADD COMMENTlink modified 5.5 years ago by Alex Reynolds28k • written 5.5 years ago by Duarte Molha200
2
gravatar for Aaronquinlan
5.5 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

You can use the closest tool in bedtools, ignore overlapping features (-io) and require the distance (-d) to be 1. I would recommend using verion 2.18.2 or later.

$ cat i.bed
chr1    10    20    a
chr1    22    24    b
chr1    24    26    c
chr1    26    28    d
chr1    40    50    e
chr1    60    70    f
chr1    65    75    g
chr1    75    80    h

$ bedtools closest -a i.bed -b i.bed -io -d | awk '$9==1'
chr1    22    24    b    chr1    24    26    c    1
chr1    24    26    c    chr1    22    24    b    1
chr1    24    26    c    chr1    26    28    d    1
chr1    26    28    d    chr1    24    26    c    1
chr1    65    75    g    chr1    75    80    h    1
chr1    75    80    h    chr1    65    75    g    1

If you want a non-redundant set, just do the following:

$ bedtools closest -a i.bed -b i.bed -io -d | awk '$9==1' | awk '$2<$6'
chr1    22    24    b    chr1    24    26    c    1
chr1    24    26    c    chr1    26    28    d    1
chr1    65    75    g    chr1    75    80    h    1
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Aaronquinlan11k
1
gravatar for sjneph
5.5 years ago by
sjneph600
sjneph600 wrote:

This is pretty straightforward with BEDOPS:

sort-bed file.bed > file.sort.bed
closest-features --no-overlaps --dist --closest file.sort.bed file.sort.bed | awk -F'|' '$3*$3==1'

You basically ask what elements are closest 'to the left' and 'to the right' of every element in the file using an edge-2-edge measure (by running closest-features on the file against itself). If the distance is 1 or -1, then you have found an element that you're looking for. The output is still sorted per sort-bed so you can use immediately in further downstream BEDOPS operations.

Perhaps the only slightly tricky part is in using --no-overlaps. This ensures that elements declared closest 'to the left/right' are disjoint (which includes immediately adjacent elements) from the reference element. The awk math trick is needed because there is no built-in absolute value function in awk and closest-features returns the signed distance to the nearest element.

$ cat file.sort.bed
chr1    10  20  h
chr1    10  25  i
chr1    20  25  j
chr1    25  50  k
chr1    30  35  l
chr1    31  45  m
chr1    32  46  n

$ closest-features --no-overlaps --dist --closest file.sort.bed file.sort.bed | awk -F'|' '$3*$3==1'
chr1    10  20  h|chr1  20  25  j|1
chr1    10  25  i|chr1  25  50  k|1
chr1    20  25  j|chr1  10  20  h|-1
chr1    25  50  k|chr1  20  25  j|-1

since we used --closest, only the single nearest element is chosen, with ties 'going to the left'. It's also worth noting that closest-features has essentially no memory overhead and it's wicked fast. Use it with any size inputs without issue.

ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by sjneph600
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour