Ensembl API for retrieving human gene families
Entering edit mode
5.1 years ago
David.shaw ▴ 10

I am trying to retrieve gene paralogs and members of gene families given a certain input gene. For example, If my input is TMEM110 I would like all the TMEM genes that are paralogues and members of its gene family. (e.g. TMEM*)

Currently, the script below will return the gene family for all species rather than just human. When I change 'Multi' to Human the script breaks. Also it outputs protein IDs, but I would like it to return ENSEMBL gene ids like the input (ENSG00000139618).

Any help would be appreciated!

use strict;
    use warnings;

    use Bio::EnsEMBL::Registry;

    ## Load the registry automatically
    my $reg = "Bio::EnsEMBL::Registry";

    ## Get the compara genemember adaptor
    my $gene_member_adaptor = $reg->get_adaptor("Multi", "compara", "GeneMember");

    ## Get the compara family adaptor
    my $family_adaptor = $reg->get_adaptor("Multi", "compara", "Family");

    ## Get the compara member
    my $gene_member = $gene_member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", "ENSG00000139618");

    ## Get all the families
    my $all_families = $family_adaptor->fetch_all_by_Member($gene_member);

    ## For each family
    foreach my $this_family (@{$all_families}) {
      print $this_family->description(), " (description score = ", $this_family->description_score(), ")\n";

      ## print the members in this family
      my $all_members = $this_family->get_all_Members();
      foreach my $this_member (@{$all_members}) {
        print $this_member->source_name(), " ", $this_member->stable_id(), " (", $this_member->taxon()->name(), ")\n";
      print "\n";
gene API ensembl perl • 1.3k views
Entering edit mode
5.1 years ago

The gene family adaptor gets all members of the family and doesn't discriminate on species. You want to use the homology adaptor instead, then use the method_link_species_set adaptor to get paralogues. There's more info in this section of the online course.

Entering edit mode

Is there a way of getting similarly named genes? For example, TMEM110 has 1 paralogue but there are many members of TMEM. Whilst I can use regular expressions for this example, for my actual application I don't want to have regular expressions for all genes i.e:


KLF11 > KLF*

etc. Depending on what the user inputs into the script (which will be many genes one after another)


Login before adding your answer.

Traffic: 1326 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6