Question: Genes around open regions
0
gravatar for g.sathish.k
2.9 years ago by
g.sathish.k0 wrote:

Hello Everyone! I am new here and new to field of genomics and informatics. Recently we generated ATAC-seq data of normal and cancer cells.. The data shows open regions in cancer cells compared to normal cells. I would like to know if there is a program out there to generate a list of genes that are in the neighborhood of these open regions. Which I would then feed to IPA or GSEA to see what function those genes enrich for. Ideally I would like to come up three lists , one for genes within 25kb of open regions, 2nd for genes within 75kb and 3rd for genes within 150kb

Please suggest how can I achieve that.

atac-seq open genes gene • 805 views
ADD COMMENTlink modified 2.9 years ago by Alex Reynolds30k • written 2.9 years ago by g.sathish.k0
3
gravatar for Alex Reynolds
2.9 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Via BEDOPS bedmap:

$ bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed
$ bedmap --echo --echo-map-id-uniq --range 75000 open-regions.bed genes.bed > answer.75kb.bed
$ bedmap --echo --echo-map-id-uniq --range 150000 open-regions.bed genes.bed > answer.150kb.bed

IDs could be fed into http://www.ebi.ac.uk/QuickGO/ for classification (depending on format).

To get a genes.bed file, e.g. via Gencode:

$ wget -qO- ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_26/gencode.v26.basic.annotation.gff3.gz \
  | gunzip -c - \
  | convert2bed --input=gff - \
  | awk '$8=="gene"' - \
  > genes.bed
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Alex Reynolds30k

Thank you Alex, I really appreciate your help.

ADD REPLYlink written 2.9 years ago by g.sathish.k0

I want to use the release_19 of gencode so i made changes to the above code and was able to successfully generate genes.bed file. But, I couldn't get the "bedmap --echo --echo-map-id-uniq --range 25000 open-regions.bed genes.bed > answer.25kb.bed" code to work. It creates the output file but the contents are just a repeat of open-regions file.

ADD REPLYlink written 2.9 years ago by g.sathish.k0

The --echo option reports the open region element, the --echo-map-id-uniq reports all IDs of genes from the Gencode v19 set that overlap the open region.

If you want the genes themselves, you could do something like:

$ bedmap --echo-map --multidelim '\n' --range 25000 open-regions.bed genes.bed | sort-bed - > genes.25kb.bed

Etc.

Leaving out --echo and putting in --echo-map and --multidelim '\n' options gives you the genes that overlap the open region within 25kb.

See the documentation for more information about the --echo-map-* options available to you. It might seem a little overwhelming but the docs try to walk through several examples of how they work.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Alex Reynolds30k

Thanks Alex that really helped :)

ADD REPLYlink written 2.9 years ago by g.sathish.k0
2
gravatar for jared.andrews07
2.9 years ago by
jared.andrews076.4k
Memphis, TN
jared.andrews076.4k wrote:

Personally, I'd use GREAT, as you can just feed it your differential regions and it'll do all sorts of enrichment analyses. If you want to feed it three lists of annotated regions as you describe, you can use bedtools closest with the -d option and feed it your regions and a gene list in BED or GTF format (which you could download from the UCSC table browser). That will get the closest gene to each region and the last column will report the distance to said gene for your region, which would allow you to create your three lists via awk, perl, python, excel, whatever.

ADD COMMENTlink written 2.9 years ago by jared.andrews076.4k

Thanks Jared, I appreciate it.

ADD REPLYlink written 2.9 years ago by g.sathish.k0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 985 users visited in the last hour