Question: Identify overlapping coordinates
0
6.6 years ago by
fire_water80
United States
fire_water80 wrote:

Given a text file with the following format:

```chrom    start    end    num    list        lib1    lib2    lib3    + chr1    4529048    4529082    3    lib1,lib2,lib3    1    1    1    + chr1    4771642    4771666    3    lib1,lib2,lib3    1    1    1    + chr1    4772370    4772405    3    lib1,lib2,lib3    1    1    1    + (thousands of rows)```

Do 2 things:

1. Identify the rows whose start end coordinates overlap
2. Then return the left-most and right-most coordinates

For example:

```chrom    start    end    num    list    lib1    lib2    lib3    + chr1    1    5    3    lib1,lib2,lib3    1    1    1    + chr1    4    6    3    lib1,lib2,lib3    1    1    1    + chr1    7    9    3    lib1,lib2,lib3    1    1    1    +```

In that example, the start and end coordinates of row 1 and row 2 overlap. In that case, the left-most coordinate is 1 and the right-most coordinate is 6.

Does a bioinformatics tool exist that can solve this problem?

Thank you!

rna-seq • 1.6k views
modified 6.6 years ago by Alex Reynolds31k • written 6.6 years ago by fire_water80
4
6.6 years ago by
Vivek2.4k
Denmark
Vivek2.4k wrote:
2
6.6 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

The BEDOPS commands `sort-bed` and `bedops --merge` will do what you want. You'll need to make your text file into a sorted BED file.

For example:

``````\$ tail -n +2 elements.txt > elements.unsorted.bed
``````

will strip the first header line from your text file and make it into an unsorted BED file.

Then sort the genomic elements:

``````\$ sort-bed elements.unsorted.bed > elements.bed
``````

Then merge elements:

``````\$ bedops --merge elements.bed > mergedElements.bed
``````

Note that merged elements are calculated from the genomic regions, and so features are not included. But you can next use `bedmap` to map features from your original `elements.bed` back to `mergedElements.bed`:

``````\$ bedmap --echo --echo-map mergedElements.bed elements.bed > mergedElementsWithFeatures.bed
``````

The output here includes standard delimiters and is easily parsed with downstream scripts or other tools.