Question: How To Download All The Introns From Ensembl
8
gravatar for Anima Mundi
7.3 years ago by
Anima Mundi2.4k
Italy
Anima Mundi2.4k wrote:

Hi,

I would like to know how could I download all the introns (in FASTA) of a species from Ensembl via web.

fasta ensembl web intron download • 5.5k views
ADD COMMENTlink written 7.3 years ago by Anima Mundi2.4k
13
gravatar for Akk
7.3 years ago by
Akk210
Cambridge, UK
Akk210 wrote:

Hi,

As far as I know, we don't store the intron sequences explicitly anywhere.

Here's a piece of Perl code that uses the Ensembl Perl API to fetch all intron sequences for the transcripts overlapping a particular region on the first human chromosome. It can easily be modified to fetch all transcripts in the species and to dump the sequence to a file instead of to the screen:

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Utils::SeqDumper;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db( '-host' => 'ensembldb.ensembl.org',
                                  '-port' => '5306',
                                  '-user' => 'anonymous',
                                  '-db_version' => '63' );

my $sa = $registry->get_adaptor( 'Human', 'Core', 'Slice' );
my $slice = $sa->fetch_by_region( 'Chromosome', '1', 12_000, 13_000 );

my $dumper = Bio::EnsEMBL::Utils::SeqDumper->new();

foreach my $transcript ( @{ $slice->get_all_Transcripts() } ) {
  foreach my $intron ( @{ $transcript->get_all_Introns() } ) {
    $dumper->dump( $intron->feature_Slice(), 'FASTA' );
  }
}

I hope this helps.

ADD COMMENTlink written 7.3 years ago by Akk210

This helps, thanks!

ADD REPLYlink written 7.3 years ago by Anima Mundi2.4k

You should be aware that there can be multiple transcripts, due to alternate splicing, for any particular gene. You would need to use the canonical transcript, or do some post-hoc removal of the redundant/overlapping introns to ensure you aren't over estimating the number of introns retrieved.

ADD REPLYlink written 7.0 years ago by Steve Moss2.2k
4
gravatar for Steve Moss
7.0 years ago by
Steve Moss2.2k
United Kingdom
Steve Moss2.2k wrote:

Additional to my comment and Andreas' post, this is how I would deal with intron redundancy:

use strict;
use warnings;

use Bio::EnsEMBL::Registry;
use Bio::EnsEMBL::Utils::SeqDumper;

my $registry = 'Bio::EnsEMBL::Registry';

$registry->load_registry_from_db(-host => 'ensembldb.ensembl.org',
                                 -port => 5306,
                                 -user => 'anonymous',
                                 -passwd => undef,
                                 -db_version => 64);

my $gene_adapter = $registry->get_adaptor('Human', 'Core', 'Gene');

my $dumper = Bio::EnsEMBL::Utils::SeqDumper->new();

while(my $gene_id = shift(@{$gene_adapter->list_stable_ids()})) {
    my $gene = $gene_adapter->fetch_by_stable_id($gene_id);
    my $canonical_transcript = $gene->canonical_transcript();
    while(my $intron = shift(@{$canonical_transcript->get_all_Introns()})) {
        $dumper->dump($intron->feature_Slice(), 'FASTA', 'introns.fasta');
    }
}
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Steve Moss2.2k

Oh, thanks for this important improvement, I will test it soon!

ADD REPLYlink written 7.0 years ago by Anima Mundi2.4k
2
gravatar for Radhouane Aniba
7.3 years ago by
Radhouane Aniba750 wrote:

You can refer to this biostar thread

You can design your query at biomart and export it to xml and use the script I provided to get your introns

Hope this helps

Radhouane

ADD COMMENTlink written 7.3 years ago by Radhouane Aniba750
1

BioMart won't actually work in this case, as the introns are not stored, and are not in the BioMart database. The API script Andreas provides is the best way.

ADD REPLYlink written 7.3 years ago by Giulietta - Ensembl Helpdesk1.2k

Thanks a lot, but I will have to deal with Ensembl also for other issues, I would prefer to clarify in it this particular kind of problem.

ADD REPLYlink written 7.3 years ago by Anima Mundi2.4k
1
gravatar for Anuraj Nayarisseri
7.3 years ago by
Indore
Anuraj Nayarisseri740 wrote:

Enter any Gene Symbol in Ensembl. choose your organism. follow the link you can find one geneatlas link click on the Geneatlas link. You will all the introns and exons of yor particular gene.

Hope this will help you

ADD COMMENTlink written 7.3 years ago by Anuraj Nayarisseri740

Thank you too, but I would like to save all the introns of one genome.

ADD REPLYlink written 7.3 years ago by Anima Mundi2.4k
0
gravatar for Biojl
6.8 years ago by
Biojl1.6k
Barcelona
Biojl1.6k wrote:

Be careful because ultimately ENSEMBL is not working properly and most of the times do not give you all the information you request. Check your data! I normally cut the first column from results, sort it and uniq it and compare to the ID list I provided, just to be sure that I get at least one line for each ID I requested.

My advise is to directly download the FASTA files for the whole genome and only ask ENSEMBL for the positions. Then extract them yourself. It may seem more work but you'll get exact results and will avoid you some troubles when analyzing results.

ADD COMMENTlink written 6.8 years ago by Biojl1.6k

Thanks for the hint. How long has it been working improperly?

ADD REPLYlink written 6.8 years ago by Anima Mundi2.4k

At least for 15 months that is the time I found the issue, probably more... I contacted ENSEMBL and they told be they knew that was happening and the only solution they provide is to make smaller queries, but even in some cases that fails.

ADD REPLYlink written 6.8 years ago by Biojl1.6k

Biojl is referring to BioMart, not to Ensembl! BioMart indeed is not very good in handling such large genome-wide queries, but, as already pointed out above, you cannot retrieve intron information with BioMart anyway. The API script provided by gawbul should work perfectly fine for you and give you all the information you request.

ADD REPLYlink written 6.8 years ago by Bert Overduin3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 865 users visited in the last hour