Identify overlapping coordinates
2
0
Entering edit mode
9.4 years ago
fire_water ▴ 80

Given a text file with the following format:

chrom    start    end    num    list        lib1    lib2    lib3    +
chr1    4529048    4529082    3    lib1,lib2,lib3    1    1    1    +
chr1    4771642    4771666    3    lib1,lib2,lib3    1    1    1    +
chr1    4772370    4772405    3    lib1,lib2,lib3    1    1    1    +
(thousands of rows)

Do 2 things:

  1. Identify the rows whose start end coordinates overlap
  2. Then return the left-most and right-most coordinates

For example:

chrom    start    end    num    list    lib1    lib2    lib3    +
chr1    1    5    3    lib1,lib2,lib3    1    1    1    +
chr1    4    6    3    lib1,lib2,lib3    1    1    1    +
chr1    7    9    3    lib1,lib2,lib3    1    1    1    +

In that example, the start and end coordinates of row 1 and row 2 overlap. In that case, the left-most coordinate is 1 and the right-most coordinate is 6.

Does a bioinformatics tool exist that can solve this problem?

Thank you!

RNA-Seq • 2.5k views
ADD COMMENT
4
Entering edit mode
9.4 years ago
Vivek ★ 2.7k

Bedtools merge

ADD COMMENT
2
Entering edit mode
9.4 years ago

The BEDOPS commands sort-bed and bedops --merge will do what you want. You'll need to make your text file into a sorted BED file.

For example:

$ tail -n +2 elements.txt > elements.unsorted.bed

will strip the first header line from your text file and make it into an unsorted BED file.

Then sort the genomic elements:

$ sort-bed elements.unsorted.bed > elements.bed

Then merge elements:

$ bedops --merge elements.bed > mergedElements.bed

Note that merged elements are calculated from the genomic regions, and so features are not included. But you can next use bedmap to map features from your original elements.bed back to mergedElements.bed:

$ bedmap --echo --echo-map mergedElements.bed elements.bed > mergedElementsWithFeatures.bed

The output here includes standard delimiters and is easily parsed with downstream scripts or other tools.

ADD COMMENT

Login before adding your answer.

Traffic: 2536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6