Question: Need to systematically identify genes within some distance of a response element in the genome
0
gravatar for rependo
8 days ago by
rependo0
rependo0 wrote:

I'm trying to find a program or script that can be used to systematically return a list of genes in the genome that fall within some specified distance of a response element.

I have two files to work with: 1.) A list of predicted response elements in the genome, identified using PoSSuM-search, that includes each elements genomic coordinates 2.) The annotation file from that same genome, that includes genomic coordinates for gene features

Ideally, I want to use the genomic coordinates for response elements in file (1) to pull out any gene present in the annotation file that falls within a pre-specified distance of a response element (e.g 100kb).

Thank you!

To moderators: this same question was crossposted on Researchgate https://www.researchgate.net/post/Recommended_programs_to_systematically_identify_genes_within_some_distance_of_a_response_element_in_the_genome

rna-seq gene genome • 99 views
ADD COMMENTlink modified 8 days ago by Alex Reynolds27k • written 8 days ago by rependo0

Have a look at bedtools slop to define windows around a set of coordinates (here these response elements) and bedtools intersect to intersect those with the genes. Can you give an example how the output should look like?

ADD REPLYlink written 8 days ago by ATpoint12k

I'm figuring it out as I go, but ideally the output would be a .txt, with columns defined as:

1.) Gene (one per line): all genes located within 100kb of a response element identified in the input 2.) Strand 3.) Gene start position 4.) Gene strand 5.) Response element start position 6.) Response element strand

ADD REPLYlink written 7 days ago by rependo0
2
gravatar for Alex Reynolds
8 days ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

Via BEDOPS bedmap:

$ bedmap --range 100000 --echo --echo-map response-elements.bed genes.bed > answer.bed

If you don't have genes in BED format, but in GFF format, that's easy to fix:

$ bedmap --range 100000 --echo --echo-map response-elements.bed <(gff2bed < genes.gff) > answer.bed

Or, for GTF-formatted annotations:

$ bedmap --range 100000 --echo --echo-map response-elements.bed <(gtf2bed < genes.gtf) > answer.bed
ADD COMMENTlink modified 7 days ago • written 8 days ago by Alex Reynolds27k

Awesome -- thank you, Alex.

ADD REPLYlink written 7 days ago by rependo0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1398 users visited in the last hour