Question: Genes around open regions
gravatar for g.sathish.k
9 weeks ago by
g.sathish.k0 wrote:

Hello Everyone! I am new here and new to field of genomics and informatics. Recently we generated ATAC-seq data of normal and cancer cells.. The data shows open regions in cancer cells compared to normal cells. I would like to know if there is a program out there to generate a list of genes that are in the neighborhood of these open regions. Which I would then feed to IPA or GSEA to see what function those genes enrich for. Ideally I would like to come up three lists , one for genes within 25kb of open regions, 2nd for genes within 75kb and 3rd for genes within 150kb

Please suggest how can I achieve that.

atac-seq open genes gene • 199 views
ADD COMMENTlink modified 9 weeks ago by Alex Reynolds21k • written 9 weeks ago by g.sathish.k0
gravatar for Alex Reynolds
9 weeks ago by
Alex Reynolds21k
Seattle, WA USA
Alex Reynolds21k wrote:

Via BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed
$ bedmap --echo --echo-map-id-uniq --range 75000 open-regions.bed genes.bed > answer.75kb.bed
$ bedmap --echo --echo-map-id-uniq --range 150000 open-regions.bed genes.bed > answer.150kb.bed

IDs could be fed into for classification (depending on format).

To get a genes.bed file, e.g. via Gencode:

$ wget -qO- \
  | gunzip -c - \
  | convert2bed --input=gff - \
  | awk '$8=="gene"' - \
  > genes.bed
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Alex Reynolds21k

Thank you Alex, I really appreciate your help.

ADD REPLYlink written 9 weeks ago by g.sathish.k0

I want to use the release_19 of gencode so i made changes to the above code and was able to successfully generate genes.bed file. But, I couldn't get the "bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed" code to work. It creates the output file but the contents are just a repeat of open-regions file.

ADD REPLYlink written 9 weeks ago by g.sathish.k0

The --echo option reports the open region element, the --echo-map-id-uniq reports all IDs of genes from the Gencode v19 set that overlap the open region.

If you want the genes themselves, you could do something like:

$ bedmap --echo-map --multidelim '\n' --range 25000 open-regions.bed genes.bed | sort-bed - > genes.25kb.bed


Leaving out --echo and putting in --echo-map and --multidelim '\n' options gives you the genes that overlap the open region within 25kb.

See the documentation for more information about the --echo-map-* options available to you. It might seem a little overwhelming but the docs try to walk through several examples of how they work.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Alex Reynolds21k

Thanks Alex that really helped :)

ADD REPLYlink written 9 weeks ago by g.sathish.k0
gravatar for jared.andrews07
9 weeks ago by
Washington University in St. Louis
jared.andrews07260 wrote:

Personally, I'd use GREAT, as you can just feed it your differential regions and it'll do all sorts of enrichment analyses. If you want to feed it three lists of annotated regions as you describe, you can use bedtools closest with the -d option and feed it your regions and a gene list in BED or GTF format (which you could download from the UCSC table browser). That will get the closest gene to each region and the last column will report the distance to said gene for your region, which would allow you to create your three lists via awk, perl, python, excel, whatever.

ADD COMMENTlink written 9 weeks ago by jared.andrews07260

Thanks Jared, I appreciate it.

ADD REPLYlink written 9 weeks ago by g.sathish.k0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1379 users visited in the last hour