Question: Genes around open regions
12 days ago by
g.sathish.k0 wrote:

Hello Everyone! I am new here and new to field of genomics and informatics. Recently we generated ATAC-seq data of normal and cancer cells.. The data shows open regions in cancer cells compared to normal cells. I would like to know if there is a program out there to generate a list of genes that are in the neighborhood of these open regions. Which I would then feed to IPA or GSEA to see what function those genes enrich for. Ideally I would like to come up three lists , one for genes within 25kb of open regions, 2nd for genes within 75kb and 3rd for genes within 150kb

Please suggest how can I achieve that.

atac-seq open genes gene • 145 views
modified 12 days ago by Alex Reynolds20k • written 12 days ago by g.sathish.k0
12 days ago by
Alex Reynolds20k
Seattle, WA USA
Alex Reynolds20k wrote:

Via BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed
$ bedmap --echo --echo-map-id-uniq --range 75000 open-regions.bed genes.bed > answer.75kb.bed
$ bedmap --echo --echo-map-id-uniq --range 150000 open-regions.bed genes.bed > answer.150kb.bed

IDs could be fed into for classification (depending on format).

To get a genes.bed file, e.g. via Gencode:

$ wget -qO- \
  | gunzip -c - \
  | convert2bed --input=gff - \
  | awk '$8=="gene"' - \
  > genes.bed
modified 12 days ago • written 12 days ago by Alex Reynolds20k

Thank you Alex, I really appreciate your help.

written 11 days ago by g.sathish.k0

I want to use the release_19 of gencode so i made changes to the above code and was able to successfully generate genes.bed file. But, I couldn't get the "bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed" code to work. It creates the output file but the contents are just a repeat of open-regions file.

written 11 days ago by g.sathish.k0

The --echo option reports the open region element, the --echo-map-id-uniq reports all IDs of genes from the Gencode v19 set that overlap the open region.

If you want the genes themselves, you could do something like:

$ bedmap --echo-map --multidelim '\n' --range 25000 open-regions.bed genes.bed | sort-bed - > genes.25kb.bed


Leaving out --echo and putting in --echo-map and --multidelim '\n' options gives you the genes that overlap the open region within 25kb.

See the documentation for more information about the --echo-map-* options available to you. It might seem a little overwhelming but the docs try to walk through several examples of how they work.

modified 11 days ago • written 11 days ago by Alex Reynolds20k

Thanks Alex that really helped :)

written 10 days ago by g.sathish.k0
12 days ago by
Washington University in St. Louis
jared.andrews0790 wrote:

Personally, I'd use GREAT, as you can just feed it your differential regions and it'll do all sorts of enrichment analyses. If you want to feed it three lists of annotated regions as you describe, you can use bedtools closest with the -d option and feed it your regions and a gene list in BED or GTF format (which you could download from the UCSC table browser). That will get the closest gene to each region and the last column will report the distance to said gene for your region, which would allow you to create your three lists via awk, perl, python, excel, whatever.

written 12 days ago by jared.andrews0790

Thanks Jared, I appreciate it.

written 12 days ago by g.sathish.k0
