Question: Querying Ensembl For Transcript Gene Annotation
1
gravatar for pld
6.0 years ago by
pld4.8k
United States
pld4.8k wrote:

I am attempting to take the IDs from a GAL file and query Ensembl to retrieve whatever annotations are available but I am running into a problem when calling the fetch_by_dbID() function.

    #libs
use Bio::EnsEMBL::Registry;
use Text::CSV;

#constants
my $ENSEMBL_DATABASE = 'ensembldb.ensembl.org';
my $ENSEMBL_USERNAME = 'anonymous';
my $REGISTRY_USE     = 'Bio::EnsEMBL::Registry';
my $ENSEMBL_PORT     = 5306;

#load command line arguments
my ($infi, $outdir, $idcol, $skip) = @ARGV;

#lead file of data to get annotations for
open FI, "<", $infi or die "ERROR: Can't open $infi: $!";


#my @data;
my @data;
foreach my $row (<FI>){
#    next if 1 .. $#skip;
    chomp $row;
    my @cells = split /\t/, $row;
#    print "@cells\n";
    push @data, \@cells;
}
close $fh;

#connect to ensembl registry
my $registry = $REGISTRY_USE;
$registry -> load_registry_from_db(
    -host => $ENSEMBL_DATABASE,
    -user => $ENSEMBL_USERNAME,
    -port => $ENSEMBL_PORT
);

#get adaptors
my $geneadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Gene');

foreach (@data){
    $query = $_->[3];
    print $query, "\n";
    print $geneadaptor->fetch_by_dbID($query);
}

If I comment out print $geneadaptor->fetch_by_dbID($query); The output of print $query, "\n"; are the IDs I wish to query by (e.g. ensmmut00000018940). When print $geneadaptor->fetch_by_dbID($query); is not commented out, nothing prints on the screen. I'm really lost here. What am I doing wrong?

Replacing the previous foreach (@data)... loop with:

my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');
foreach (@data){
    my $query = $_->[3];
    my $trans = new Bio::EnsEMBL::Transcript();
    print $query, "\n";
    $trans = $tranadaptor->fetch_by_stable_id($query), "\n";
    print (defined $trans ? $trans->stable_id(): "NONE\n"), "\n";
}

Results in NONE being printed, meaning there is still noting coming back from the query.

perl ensembl • 2.1k views
ADD COMMENTlink modified 6.0 years ago • written 6.0 years ago by pld4.8k
2
gravatar for Emily_Ensembl
6.0 years ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

Hi Joe

A few issues. Firstly, ENSMMUT00000018940 is a transcript ID, so the GeneAdaptor will get nothing. You need to use the TranscriptAdaptor

Secondly, you need to use the command fetch_by_stable_id. fetch_by_dbID fetches by the internal database IDs, which are just long number strings.

Also, what this fetches is just the number string. You need to use the Transcript module to specify actual data attached to this transcript, for example coordinates, strand, exons etc http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html

Emily

ADD COMMENTlink written 6.0 years ago by Emily_Ensembl18k

What would be the best way to pull the gene and protein of each transcript and get the annotations for them (e.g. GO). I also updated the OP with a new attempt.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by pld4.8k

To get the gene, there's a command in the Transcript module, get_Gene.

To get a protein you can create an if loop using if $transcript->biotype eg protein_coding (assuming you've fetched a transcript and called it $transcript), then get the protein using the command translate.

You can get GO terms for a feature using get_all_xrefs(GO%). You'll need to create an OntologyTerm Adaptor: my $OTadaptor = $registry -> get_adaptor('Multi', 'Ontology', 'OntologyTerm'); Now use the OTadaptor to fetch the relevant term and the OntologyTerm module print the data you're after: http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1OntologyTerm.html

ADD REPLYlink written 6.0 years ago by Emily_Ensembl18k

I just ran the following code on my computer (I cut out the beginning part as I don't have your input file)

#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = "Bio::EnsEMBL::Registry";

$registry->load_registry_from_db(
   -host => 'ensembldb.ensembl.org',
   -user => 'anonymous'
);

my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');

my $query = "ensmmut00000018940";
my $trans = new Bio::EnsEMBL::Transcript();
print $query, "\n";
$trans = $tranadaptor->fetch_by_stable_id($query), "\n";
print (defined $trans ? $trans->stable_id(): "NONE\n"), "\n";

My output:

ensmmut00000018940

ENSMMUT00000018940

So I'm not sure what's happening when you try to run it. Can you check the versions of Perl, the Ensembl API and BioPerl you're using?

Also, this line

my $trans = new Bio::EnsEMBL::Transcript();

is completely unnecessary. The API doesn't need to be told that it's getting a new transcript. You can get go straight to:

my $trans = $tranadaptor->fetch_by_stable_id($query);

And I'm not sure why you've put loads of "\n" command on lines which aren't print lines. "\n" is a print command.

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Emily_Ensembl18k

Hm, the issue seems to be with parsing the TSV file. I've just stripped the IDs out of the file and I'm able to query now. There is an issue with the get_Gene method: Can't locate object method "get_Gene" via package "Bio::EnsEMBL::Transcript" when I call $transcript -> get_Gene. I've used the Gene package use Bio::EnsEMBL::Gene;. Any ideas? The usage is exactly as depicted in the documentation.Thanks again for the great support!

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by pld4.8k

I just tried this:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = "Bio::EnsEMBL::Registry";

$registry->load_registry_from_db(
   -host => 'ensembldb.ensembl.org',
   -user => 'anonymous'
);

my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');

my $query = "ensmmut00000018940";
print $query, "\n";
my $trans = $tranadaptor->fetch_by_stable_id($query);
print (defined $trans ? $trans->stable_id(): "NONE");
print "\n";
my $gene = $trans->get_Gene;

print $gene->stable_id(), "\n";

And this was my output:

ensmmut00000018940
ENSMMUT00000018940
ENSMMUG00000013491

Was this what you did?

ADD REPLYlink written 6.0 years ago by Emily_Ensembl18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 937 users visited in the last hour