Question: Identify overlapping coordinates
gravatar for fire_water
5.2 years ago by
United States
fire_water80 wrote:

Given a text file with the following format:

chrom    start    end    num    list        lib1    lib2    lib3    +
chr1    4529048    4529082    3    lib1,lib2,lib3    1    1    1    +
chr1    4771642    4771666    3    lib1,lib2,lib3    1    1    1    +
chr1    4772370    4772405    3    lib1,lib2,lib3    1    1    1    +
(thousands of rows)


Do 2 things:

  1. Identify the rows whose start end coordinates overlap
  2. Then return the left-most and right-most coordinates

For example:

chrom    start    end    num    list    lib1    lib2    lib3    +
chr1    1    5    3    lib1,lib2,lib3    1    1    1    +
chr1    4    6    3    lib1,lib2,lib3    1    1    1    +
chr1    7    9    3    lib1,lib2,lib3    1    1    1    +

In that example, the start and end coordinates of row 1 and row 2 overlap. In that case, the left-most coordinate is 1 and the right-most coordinate is 6.

Does a bioinformatics tool exist that can solve this problem?

Thank you!

rna-seq • 1.4k views
ADD COMMENTlink modified 5.2 years ago by Alex Reynolds28k • written 5.2 years ago by fire_water80
gravatar for Vivek
5.2 years ago by
Vivek2.3k wrote:

Bedtools merge

ADD COMMENTlink written 5.2 years ago by Vivek2.3k
gravatar for Alex Reynolds
5.2 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

The BEDOPS commands sort-bed and bedops --merge will do what you want. You'll need to make your text file into a sorted BED file.

For example:

$ tail -n +2 elements.txt > elements.unsorted.bed

will strip the first header line from your text file and make it into an unsorted BED file.

Then sort the genomic elements:

$ sort-bed elements.unsorted.bed > elements.bed

Then merge elements:

$ bedops --merge elements.bed > mergedElements.bed

Note that merged elements are calculated from the genomic regions, and so features are not included. But you can next use bedmap to map features from your original elements.bed back to mergedElements.bed:

$ bedmap --echo --echo-map mergedElements.bed elements.bed > mergedElementsWithFeatures.bed

The output here includes standard delimiters and is easily parsed with downstream scripts or other tools.

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Alex Reynolds28k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 690 users visited in the last hour