Question

Features Of Flanking Regions

2

Entering edit mode

11.7 years ago

Chris Gene ▴ 80

I have a list of genomic positions in the human genome, and I want to get a general idea about the genomic context of the flanking regions (GC richness, spanning a gene (exon/intron), etc.).

The file is

chr pos

e.g.

13 1234567

Is there a simple way to go about this?

Thank-you!

human position • 3.0k views

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 11.7 years ago by Chris Gene ▴ 80

Ram · Answer 1 · 2012-08-20

Your question is a bit generic the word features is broad. The scale at which you want to proceed is also important. The answer will depend on whether you have one, dozens or millions of potential regions.

The simplest option is to go the the UCSC genome browser find your region and download the tracks that contain the data that you are interested in.

More automated solutions include:

query the UCSC genome browser, for some examples see:
Use a biomart service http://www.biomart.org/martservice.html
download bulk datasets and intersect with BedTools

There are also a large number of ChIP-Seq tools that aim to automatize this step, you might want to look at those.

Ram · Answer 2 · 2012-08-20

If you wish to do this kind of thing in a script, you can get many features associated with a particular region of a genome using the Ensembl Perl API. You will need to install the API and perl mysql DBI/DBD modules. Then you can retrieve sequences for your regions of interest (with arbitrary flank) as follows:

#!/usr/bin/perl
use warnings;
use strict;
use DBD::mysql;
use Bio::EnsEMBL::Registry;

my $registry = 'Bio::EnsEMBL::Registry';
    $registry->load_registry_from_db(
    -host => 'ensembldb.ensembl.org', # alternatively 'useastdb.ensembl.org'
    -user => 'anonymous' 
);

my $chr = 13; 
my $pos = 111234567;
my $flank = 50;

my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' ); 
my $slice = $slice_adaptor->fetch_by_region( 'chromosome', $chr, $pos-$flank, $pos+$flank );
my $sequence = $slice->seq();
print $sequence, "\n";
exit();

Examples for retrieving many types of additional information (including exons, introns, etc.) can be found in the Core API Tutorial.