How to extract genes 10000 bp upstream and 1000 bp downstream of long non coding RNA?
Entering edit mode
5.5 years ago
bk11 ★ 1.3k

Hi, I have two files:

chr1 73270408 73271242 XLOC_000129
chr1 83659841 83660368 XLOC_000153
chr1 163894423 163894714 XLOC_000257
chr1 55485657 55486215 XLOC_000374
chr1 155285623 155287724 XLOC_000553
chr1 155288475 155288592 XLOC_000553
chr1 155289366 155290533 XLOC_000553
chr1 53933832 53934533 TCP11L2
chr1 53935094 53935266 TCP11L2
chr1 53935961 53936136 TCP11L2
chr1 53937602 53937789 TCP11L2
chr1 53938748 53938884 TCP11L2
chr1 53939440 53939660 TCP11L2
chr1 53940084 53940204 TCP11L2
chr1 53941131 53941263 TCP11L2
chr1 53942563 53942732 TCP11L2
chr1 53945014 53945137 TCP11L2
chr1 121715553 121715945 GRPR
chr1 121719096 121719447 GRPR
chr1 121736572 121736990 GRPR

How can I want to extract genes from File2 that are located 10000 bp upstream and 10000 bp downstream of XLOC_* in File1?

bedops bedtools awk sed bash • 1.9k views
Entering edit mode
5.5 years ago

Via BEDOPS, you can use bedmap to get an answer, depending on whether you want gene names and intervals, or just gene names.

First, sort your files with BEDOPS sort-bed:

$ sort-bed File1.unsorted.bed > File1.bed
$ sort-bed File2.unsorted.bed > File2.bed

You only need to do this sort step once.

If you want gene names and intervals from File2, overlapping elements from File1 that are padded by 10k up- and downstream:

$ bedmap --range 10000 --echo-map --multidelim '\n' --skip-unmapped File1.bed File2.bed | sort-bed - | uniq - > answer.bed

If you just want gene names:

$ bedmap --range 10000 --echo-map-id File1.bed File2.bed | sort - | uniq - > answer.txt

If you want to know which elements from File2 overlap 10k-padded elements from File1:

$ bedmap --range 10000 --echo --echo-map File1.bed File2.bed > answer.bed

In this case, each line of answer.bed is an original element from File1, along with elements from File2 that overlap the 10k-padded File1 element.

See the documentation for more help, or run bedmap --help.

Entering edit mode

Thank you Alex. It worked!

Entering edit mode
5.5 years ago
Wietje ▴ 230

Bedtools has the option "closest" which allows you to search for genes close to your specified locus. You can exclude directly overlapping (i.e. touching) genes and also ask for the distance to your locus and then sort by it for instance. I am not sure whether you can actually specify 1000bp distance but if you pipe the bedtools output you'd get to the same result.

The bedtools manual has nice explanations and examples:

Bedops offers similar solutions:


Login before adding your answer.

Traffic: 1115 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6