Question: Ensembl API for retrieving human gene families
3.1 years ago
European Union
David.shaw10 wrote:

I am trying to retrieve gene paralogs and members of gene families given a certain input gene. For example, If my input is TMEM110 I would like all the TMEM genes that are paralogues and members of its gene family. (e.g. TMEM*)

Currently, the script below will return the gene family for all species rather than just human. When I change 'Multi' to Human the script breaks. Also it outputs protein IDs, but I would like it to return ENSEMBL gene ids like the input (ENSG00000139618).

Any help would be appreciated!

use strict;
    use warnings;

    use Bio::EnsEMBL::Registry;

    ## Load the registry automatically
    my $reg = "Bio::EnsEMBL::Registry";

    ## Get the compara genemember adaptor
    my $gene_member_adaptor = $reg->get_adaptor("Multi", "compara", "GeneMember");

    ## Get the compara family adaptor
    my $family_adaptor = $reg->get_adaptor("Multi", "compara", "Family");

    ## Get the compara member
    my $gene_member = $gene_member_adaptor->fetch_by_source_stable_id("ENSEMBLGENE", "ENSG00000139618");

    ## Get all the families
    my $all_families = $family_adaptor->fetch_all_by_Member($gene_member);

    ## For each family
    foreach my $this_family (@{$all_families}) {
      print $this_family->description(), " (description score = ", $this_family->description_score(), ")\n";

      ## print the members in this family
      my $all_members = $this_family->get_all_Members();
      foreach my $this_member (@{$all_members}) {
        print $this_member->source_name(), " ", $this_member->stable_id(), " (", $this_member->taxon()->name(), ")\n";
      print "\n";
3.1 years ago by
Emily_Ensembl18k wrote:

The gene family adaptor gets all members of the family and doesn't discriminate on species. You want to use the homology adaptor instead, then use the method_link_species_set adaptor to get paralogues. There's more info in this section of the online course.

Is there a way of getting similarly named genes? For example, TMEM110 has 1 paralogue but there are many members of TMEM. Whilst I can use regular expressions for this example, for my actual application I don't want to have regular expressions for all genes i.e:


KLF11 > KLF*

etc. Depending on what the user inputs into the script (which will be many genes one after another)

