I am attempting to take the IDs from a GAL file and query Ensembl to retrieve whatever annotations are available but I am running into a problem when calling the fetch_by_dbID() function.
#libs
use Bio::EnsEMBL::Registry;
use Text::CSV;
#constants
my $ENSEMBL_DATABASE = 'ensembldb.ensembl.org';
my $ENSEMBL_USERNAME = 'anonymous';
my $REGISTRY_USE = 'Bio::EnsEMBL::Registry';
my $ENSEMBL_PORT = 5306;
#load command line arguments
my ($infi, $outdir, $idcol, $skip) = @ARGV;
#lead file of data to get annotations for
open FI, "<", $infi or die "ERROR: Can't open $infi: $!";
#my @data;
my @data;
foreach my $row (<FI>){
# next if 1 .. $#skip;
chomp $row;
my @cells = split /\t/, $row;
# print "@cells\n";
push @data, \@cells;
}
close $fh;
#connect to ensembl registry
my $registry = $REGISTRY_USE;
$registry -> load_registry_from_db(
-host => $ENSEMBL_DATABASE,
-user => $ENSEMBL_USERNAME,
-port => $ENSEMBL_PORT
);
#get adaptors
my $geneadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Gene');
foreach (@data){
$query = $_->[3];
print $query, "\n";
print $geneadaptor->fetch_by_dbID($query);
}
If I comment out print $geneadaptor->fetch_by_dbID($query);
The output of print $query, "\n";
are the IDs I wish to query by (e.g. ensmmut00000018940). When print $geneadaptor->fetch_by_dbID($query);
is not commented out, nothing prints on the screen. I'm really lost here. What am I doing wrong?
Replacing the previous foreach (@data)...
loop with:
my $tranadaptor = $registry -> get_adaptor('Macaque', 'Core', 'Transcript');
foreach (@data){
my $query = $_->[3];
my $trans = new Bio::EnsEMBL::Transcript();
print $query, "\n";
$trans = $tranadaptor->fetch_by_stable_id($query), "\n";
print (defined $trans ? $trans->stable_id(): "NONE\n"), "\n";
}
Results in NONE
being printed, meaning there is still noting coming back from the query.
What would be the best way to pull the gene and protein of each transcript and get the annotations for them (e.g. GO). I also updated the OP with a new attempt.
To get the gene, there's a command in the Transcript module, get_Gene.
To get a protein you can create an if loop using if $transcript->biotype eg protein_coding (assuming you've fetched a transcript and called it $transcript), then get the protein using the command translate.
You can get GO terms for a feature using get_all_xrefs(GO%). You'll need to create an OntologyTerm Adaptor: my $OTadaptor = $registry -> get_adaptor('Multi', 'Ontology', 'OntologyTerm'); Now use the OTadaptor to fetch the relevant term and the OntologyTerm module print the data you're after: http://www.ensembl.org/info/docs/Doxygen/core-api/classBio_1_1EnsEMBL_1_1OntologyTerm.html
I just ran the following code on my computer (I cut out the beginning part as I don't have your input file)
My output:
ensmmut00000018940
ENSMMUT00000018940
So I'm not sure what's happening when you try to run it. Can you check the versions of Perl, the Ensembl API and BioPerl you're using?
Also, this line
is completely unnecessary. The API doesn't need to be told that it's getting a new transcript. You can get go straight to:
And I'm not sure why you've put loads of "\n" command on lines which aren't print lines. "\n" is a print command.
Hm, the issue seems to be with parsing the TSV file. I've just stripped the IDs out of the file and I'm able to query now. There is an issue with the
get_Gene
method:Can't locate object method "get_Gene" via package "Bio::EnsEMBL::Transcript"
when I call$transcript -> get_Gene
. I've used the Gene packageuse Bio::EnsEMBL::Gene;
. Any ideas? The usage is exactly as depicted in the documentation.Thanks again for the great support!I just tried this:
And this was my output:
Was this what you did?