Question: How To Generate A Complete Human Chromosome From Ensembl With The Bioperl Ensembl Api
1
gravatar for Simon
7.7 years ago by
Simon20
Simon20 wrote:

Hi, I would like to build an EMBL formated data for each human chromosome from Ensembl, using the associated perl API. Is this possible? I was able to retrieve sequence data and positions information on gene, tarnscripts, but I did not find any way to generate an entry in EMBL format for a whole chromosome...

Has anyone already used/make a script to do this?

Thanks Simon

ensembl bioperl • 2.0k views
ADD COMMENTlink modified 5.5 years ago by Biostar ♦♦ 20 • written 7.7 years ago by Simon20
4
gravatar for Akk
7.7 years ago by
Akk210
Cambridge, UK
Akk210 wrote:

Here's an example of dumping the slice overlapping the human BRCA2 gene in EMBL format:

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Utils::SeqDumper;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db( '-host' => 'ensembldb.ensembl.org',
                                  '-port' => '5306',
                                  '-user' => 'anonymous',
                                  '-db_version' => 63 );

my $sa = $registry->get_adaptor( 'Human', 'Core', 'Slice' );

# Get slice overlapping BRCA2 gene:
my $slice =
  $sa->fetch_by_region( 'Chromosome', '13', 32889611, 32973805 );

my $seqdumper = Bio::EnsEMBL::Utils::SeqDumper->new();

# Disable uninteresting feature types:
$seqdumper->disable_feature_type('estgene');
$seqdumper->disable_feature_type('genscan');
$seqdumper->disable_feature_type('repeat');
$seqdumper->disable_feature_type('similarity');
$seqdumper->disable_feature_type('variation');
$seqdumper->disable_feature_type('vegagene');

# Dump to stdout:
$seqdumper->dump( $slice, 'EMBL' );

Cheers, Andreas (Ensembl/Core)

ADD COMMENTlink written 7.7 years ago by Akk210
1

BTW, while putting this together I noticed a small (tiny, but fatal) error in the documentation of the SeqDumper module in our API and fixed it (will be in release 64). Thanks for turning my eyes in its direction!

ADD REPLYlink written 7.7 years ago by Akk210
3
gravatar for 2184687-1231-83-
7.7 years ago by
2184687-1231-83-4.9k wrote:

The EMBL format dumps for Ensembl are here: ftp://ftp.ensembl.org/pub/current_embl/homo_sapiens/README

ADD COMMENTlink written 7.7 years ago by 2184687-1231-83-4.9k
1

In fact I was using these data precedently, but I had to write a bunch of scripts to reconstruct whole chromosome sequences from these data, which was not the most efficient way to work I suppose. This is why was looking for other possibilities to get a whole sequence. Thanks for you help anyway!

ADD REPLYlink written 7.7 years ago by Simon20

the whole chromosomes are available in the current_fasta directory, but I think what you are looking for is something in between whole-chromosome fasta and EMBL dumps, right?

ADD REPLYlink written 7.7 years ago by 2184687-1231-83-4.9k
1
gravatar for Casey Bergman
7.7 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Why do you need to do this via the Ensembl API? Files like these are available for downloaded from EMBL-bank directly at: http://www.ebi.ac.uk/genomes/eukaryota.html

e.g. chromosome 22 in version GRCh37 can be downloaded in EMBL format here (n.b. this is a 10 Mb file)

ADD COMMENTlink written 7.7 years ago by Casey Bergman18k

I will do that.

Initially I was thinking of using Ensembl data (with the Ensembl gene info, etc) , but these data are fine as well.

Thanks for your help!

Simon

ADD REPLYlink written 7.7 years ago by Simon20
1
gravatar for Giulietta - Ensembl Helpdesk
7.7 years ago by
Cambridge, UK

Using the Ensembl API, there is a module called Bio::EnsEMBL::Utils::SeqDumper which, if given a whole chromosome Slice object, can write this data out.

Do you need more explanations/more complete script- just ask.

ADD COMMENTlink written 7.7 years ago by Giulietta - Ensembl Helpdesk1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 671 users visited in the last hour