Question: Need to systematically identify genes within some distance of a response element in the genome
I'm trying to find a program or script that can be used to systematically return a list of genes in the genome that fall within some specified distance of a response element.

I have two files to work with: 1.) A list of predicted response elements in the genome, identified using PoSSuM-search, that includes each elements genomic coordinates 2.) The annotation file from that same genome, that includes genomic coordinates for gene features

Ideally, I want to use the genomic coordinates for response elements in file (1) to pull out any gene present in the annotation file that falls within a pre-specified distance of a response element (e.g 100kb).

Thank you!

To moderators: this same question was crossposted on Researchgate

Have a look at bedtools slop to define windows around a set of coordinates (here these response elements) and bedtools intersect to intersect those with the genes. Can you give an example how the output should look like?

ADD REPLYlink written 8 days ago by ATpoint12k

I'm figuring it out as I go, but ideally the output would be a .txt, with columns defined as:

1.) Gene (one per line): all genes located within 100kb of a response element identified in the input 2.) Strand 3.) Gene start position 4.) Gene strand 5.) Response element start position 6.) Response element strand

ADD REPLYlink written 7 days ago by rependo0
Via BEDOPS bedmap:

$ bedmap --range 100000 --echo --echo-map response-elements.bed genes.bed > answer.bed

If you don't have genes in BED format, but in GFF format, that's easy to fix:

$ bedmap --range 100000 --echo --echo-map response-elements.bed <(gff2bed < genes.gff) > answer.bed

Or, for GTF-formatted annotations:

$ bedmap --range 100000 --echo --echo-map response-elements.bed <(gtf2bed < genes.gtf) > answer.bed
Awesome -- thank you, Alex.

