I am trying to retrieve in an automated fashion for a large number of gene the orthologues of a particular gene ensembl ID as well as the paralogues. Take for example ENSG00000258588, I am retrieving all the orthologues and prologues using
my $geneid="ENSG00000258588"; # ENSP00000346916
# Load the registry automatically
my $registry = 'Bio::EnsEMBL::Registry';
$registry->load_registry_from_db(
-host=>'ensembldb.ensembl.org',
-user=>'anonymous',
);
## Get the compara gene member adaptor
my $gene_member_adaptor = $registry->get_adaptor("Multi", "compara", "GeneMember");
## Get the compara member
my $gene_member = $gene_member_adaptor->fetch_by_stable_id($geneid);
my @orthologIDs;
if (defined $gene_member){
my $homology_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Homology');
my $homologies = $homology_adaptor->fetch_all_by_Member($gene_member);
my $member_adaptor = Bio::EnsEMBL::Registry->get_adaptor('Multi', 'compara', 'Member');
foreach my $homology (@{$homologies}) {
my @members = @{$homology->get_all_Members()};
foreach my $this_member (@members) {
my $orthologID=$this_member->stable_id;
push(@orthologIDs,$orthologID);
}
}
}
However, the problem is that I end up with far more prologues than I want. Looking at the gene tree, I really only want ENSG00000258588 (TRIM6-TRIM34 ) and ENSG00000258659 (TRIM34). Essentially I want the close paralogues but not the distant paralogues such as TRIM22. In this particular case, I could limit the paralogues to the Ancestral taxonomy Homininae but I do not always know what the Ancestral Taxonomy will be, sometimes it might be Rodentia or Primates depending on when the paralogue arose. I am really only interested in post mammalian divergence events.
An alternative approach I have thought of but I do not know how to implement, is to look for the most recent common ancestor of all the orthologues and then from that node retrieve all the ensembl IDs.
Any pointers, advice, suggestions would be most gratefully appreciated.