Question: How to extract genes 10000 bp upstream and 1000 bp downstream of long non coding RNA?
gravatar for bk11
12 months ago by
bk1130 wrote:

Hi, I have two files:

chr1 73270408 73271242 XLOC_000129
chr1 83659841 83660368 XLOC_000153
chr1 163894423 163894714 XLOC_000257
chr1 55485657 55486215 XLOC_000374
chr1 155285623 155287724 XLOC_000553
chr1 155288475 155288592 XLOC_000553
chr1 155289366 155290533 XLOC_000553
chr1 53933832 53934533 TCP11L2
chr1 53935094 53935266 TCP11L2
chr1 53935961 53936136 TCP11L2
chr1 53937602 53937789 TCP11L2
chr1 53938748 53938884 TCP11L2
chr1 53939440 53939660 TCP11L2
chr1 53940084 53940204 TCP11L2
chr1 53941131 53941263 TCP11L2
chr1 53942563 53942732 TCP11L2
chr1 53945014 53945137 TCP11L2
chr1 121715553 121715945 GRPR
chr1 121719096 121719447 GRPR
chr1 121736572 121736990 GRPR

How can I want to extract genes from File2 that are located 10000 bp upstream and 10000 bp downstream of XLOC_* in File1?

bash sed awk bedops bedtools • 436 views
ADD COMMENTlink modified 12 months ago by Alex Reynolds27k • written 12 months ago by bk1130
gravatar for Alex Reynolds
12 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Via BEDOPS, you can use bedmap to get an answer, depending on whether you want gene names and intervals, or just gene names.

First, sort your files with BEDOPS sort-bed:

$ sort-bed File1.unsorted.bed > File1.bed
$ sort-bed File2.unsorted.bed > File2.bed

You only need to do this sort step once.

If you want gene names and intervals from File2, overlapping elements from File1 that are padded by 10k up- and downstream:

$ bedmap --range 10000 --echo-map --multidelim '\n' --skip-unmapped File1.bed File2.bed | sort-bed - | uniq - > answer.bed

If you just want gene names:

$ bedmap --range 10000 --echo-map-id File1.bed File2.bed | sort - | uniq - > answer.txt

If you want to know which elements from File2 overlap 10k-padded elements from File1:

$ bedmap --range 10000 --echo --echo-map File1.bed File2.bed > answer.bed

In this case, each line of answer.bed is an original element from File1, along with elements from File2 that overlap the 10k-padded File1 element.

See the documentation for more help, or run bedmap --help.

ADD COMMENTlink written 12 months ago by Alex Reynolds27k

Thank you Alex. It worked!

ADD REPLYlink written 12 months ago by bk1130
gravatar for Wietje
12 months ago by
Wietje180 wrote:

Bedtools has the option "closest" which allows you to search for genes close to your specified locus. You can exclude directly overlapping (i.e. touching) genes and also ask for the distance to your locus and then sort by it for instance. I am not sure whether you can actually specify 1000bp distance but if you pipe the bedtools output you'd get to the same result.

The bedtools manual has nice explanations and examples:

Bedops offers similar solutions:

ADD COMMENTlink modified 12 months ago • written 12 months ago by Wietje180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1277 users visited in the last hour