Querying Ensembl For Transcript Gene Annotation
1
1
Entering edit mode
11.0 years ago
pld 5.1k

I am attempting to take the IDs from a GAL file and query Ensembl to retrieve whatever annotations are available but I am running into a problem when calling the fetch_by_dbID() function.

    #libs
use Bio::EnsEMBL::Registry;
use Text::CSV;

#constants
my $ENSEMBL_DATABASE = 'ensembldb.ensembl.org';
my $ENSEMBL_USERNAME = 'anonymous';
my $REGISTRY_USE     = 'Bio::EnsEMBL::Registry';
my $ENSEMBL_PORT     = 5306;

#load command line arguments
my ($infi, $outdir, $idcol, $skip) = @ARGV;

#lead file of data to get annotations for
open FI, "<", $infi or die "ERROR: Can't open $infi: $!";


#my @data;
my @data;
foreach my $row (<FI>){
#    next if 1 .. $#skip;
    chomp $row;
    my @cells = split /\t/, $row;
#    print "@cells\n";
    push @data, \@cells;
}
close $fh;

#connect to ensembl registry
my $registry = $REGISTRY_USE;
$registry -> load_registry_from_db(
    -host => $ENSEMBL_DATABASE,
    -user => $ENSEMBL_USERNAME,
    -port => $ENSEMBL_PORT
);

#get adaptors
my $geneadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Gene');

foreach (@data){
    $query = $_->[3];
    print $query, "\n";
    print $geneadaptor->fetch_by_dbID($query);
}

If I comment out print $geneadaptor->fetch_by_dbID($query); The output of print $query, "\n"; are the IDs I wish to query by (e.g. ensmmut00000018940). When print $geneadaptor->fetch_by_dbID($query); is not commented out, nothing prints on the screen. I'm really lost here. What am I doing wrong?

Replacing the previous foreach (@data)... loop with:

my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');
foreach (@data){
    my $query = $_->[3];
    my $trans = new Bio::EnsEMBL::Transcript();
    print $query, "\n";
    $trans = $tranadaptor->fetch_by_stable_id($query), "\n";
    print (defined $trans ? $trans->stable_id(): "NONE\n"), "\n";
}

Results in NONE being printed, meaning there is still noting coming back from the query.

ensembl ensembl perl • 3.1k views
ADD COMMENT
2
Entering edit mode
11.0 years ago
Emily 23k

Hi Joe

A few issues. Firstly, ENSMMUT00000018940 is a transcript ID, so the GeneAdaptor will get nothing. You need to use the TranscriptAdaptor

Secondly, you need to use the command fetch_by_stable_id. fetch_by_dbID fetches by the internal database IDs, which are just long number strings.

Also, what this fetches is just the number string. You need to use the Transcript module to specify actual data attached to this transcript, for example coordinates, strand, exons etc http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1Transcript.html

Emily

ADD COMMENT
0
Entering edit mode

What would be the best way to pull the gene and protein of each transcript and get the annotations for them (e.g. GO). I also updated the OP with a new attempt.

ADD REPLY
0
Entering edit mode

To get the gene, there's a command in the Transcript module, get_Gene.

To get a protein you can create an if loop using if $transcript->biotype eg protein_coding (assuming you've fetched a transcript and called it $transcript), then get the protein using the command translate.

You can get GO terms for a feature using get_all_xrefs(GO%). You'll need to create an OntologyTerm Adaptor: my $OTadaptor = $registry -> get_adaptor('Multi', 'Ontology', 'OntologyTerm'); Now use the OTadaptor to fetch the relevant term and the OntologyTerm module print the data you're after: http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1OntologyTerm.html

ADD REPLY
0
Entering edit mode

I just ran the following code on my computer (I cut out the beginning part as I don't have your input file)

#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = "Bio::EnsEMBL::Registry";

$registry->load_registry_from_db(
   -host => 'ensembldb.ensembl.org',
   -user => 'anonymous'
);

my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');

my $query = "ensmmut00000018940";
my $trans = new Bio::EnsEMBL::Transcript();
print $query, "\n";
$trans = $tranadaptor->fetch_by_stable_id($query), "\n";
print (defined $trans ? $trans->stable_id(): "NONE\n"), "\n";

My output:

ensmmut00000018940

ENSMMUT00000018940

So I'm not sure what's happening when you try to run it. Can you check the versions of Perl, the Ensembl API and BioPerl you're using?

Also, this line

my $trans = new Bio::EnsEMBL::Transcript();

is completely unnecessary. The API doesn't need to be told that it's getting a new transcript. You can get go straight to:

my $trans = $tranadaptor->fetch_by_stable_id($query);

And I'm not sure why you've put loads of "\n" command on lines which aren't print lines. "\n" is a print command.

ADD REPLY
0
Entering edit mode

Hm, the issue seems to be with parsing the TSV file. I've just stripped the IDs out of the file and I'm able to query now. There is an issue with the get_Gene method: Can't locate object method "get_Gene" via package "Bio::EnsEMBL::Transcript" when I call $transcript -> get_Gene. I've used the Gene package use Bio::EnsEMBL::Gene;. Any ideas? The usage is exactly as depicted in the documentation.Thanks again for the great support!

ADD REPLY
0
Entering edit mode

I just tried this:

#!/usr/bin/perl
use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = "Bio::EnsEMBL::Registry";

$registry->load_registry_from_db(
   -host => 'ensembldb.ensembl.org',
   -user => 'anonymous'
);

my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');

my $query = "ensmmut00000018940";
print $query, "\n";
my $trans = $tranadaptor->fetch_by_stable_id($query);
print (defined $trans ? $trans->stable_id(): "NONE");
print "\n";
my $gene = $trans->get_Gene;

print $gene->stable_id(), "\n";

And this was my output:

ensmmut00000018940
ENSMMUT00000018940
ENSMMUG00000013491

Was this what you did?

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6